Page MenuHomePhabricator

MediaWiki is incapable of deleting files with too many versions
Open, Needs TriagePublic

Description

In T423823, we were asked to delete a file that had nearly 5k versions uploaded. Since the limit defined by FileOpBatch is just 1000, the following error message appeared:

image.png (524×1 px, 85 KB)

Unlike big pages, nothing like bigdelete or DeleteRevisionsBatchSize exists, making the deletion tricky to carry out. I ended up using an ad hoc JS to delete most versions individually (just to get it under 1000), and then regular MediaWiki deletion feature worked as expected.

However, there really should be either some sort of batching (similar to DeleteRevisionsBatchSize), or there should be an upper limit on the number of versions one file can have.

See also: T198176: Mediawiki page deletions should happen in batches of revisions. See also T425897: Future of filearchive table for potential long-term solution.

Event Timeline

I think that we should try to avoid putting an upper limit on versions a file can have, as our "deletion" tool doesn't actually delete anything.

Are there any reasons we can't just use the same code and logic from bigdelete and DeleteRevisionBatchSize or is move file deletions into bigdelete , either as a slightly renamed copy (eg BigDeleteFiles and DeleteFileRevisionsBatchSize) or just make bigdelete apply to both files and revisions?

I'm unsure as to what would be the cleaner option, but to avoid breaking everything, I think just reusing the bigdelete system again with a new name is probably the best option to avoid breaking anything. (As to policy on how it should be handled, I'm thinking the exact same as normal bigdelete, but Commons is probably the only place where it matters.

KineticPelagic subscribed.

As part of my Clinic Duty rotation, I am moving this task to the "Needs Further Discussion" column of our MWI workboard because:

  • An internal MediaWiki team member identified the issue.
  • The task description gives some insight into a workaround and how much effort this workaround takes.
  • I wonder how frequently we are asked to delete files with this many versions.
  • I am not sure if this task should go to Radar and be for the MediaWiki Platform team. I have asked in our team Slack channel. I see that Bill completed a related task in 2018.
  • I need more context from our team to understand where this task fits in our priorities.

As part of my Clinic Duty rotation, I am moving this task to the "Needs Further Discussion" column of our MWI workboard because:

  • An internal MediaWiki team member identified the issue.
  • The task description gives some insight into a workaround and how much effort this workaround takes.
  • I wonder how frequently we are asked to delete files with this many versions.
  • I am not sure if this task should go to Radar and be for the MediaWiki Platform team. I have asked in our team Slack channel. I see that Bill completed a related task in 2018.
  • I need more context from our team to understand where this task fits in our priorities.

Even use case is uncommon, Use job queue for deletion of files with many revisions similar to T198176 is an optimization. We also need to consider undeletion as well, the corresponding task for page (instead of file) is T239095, still open.

Adding to @KineticPelagic 's comment -- could we get a #/% of files that have a number of edits that go beyond the threshold? I agree that we should not necessarily have an upper bound in place, but I would like to get a better sense of how commonly folks are running into this for prioritization purposes.