Fast delete of large directory trees

Apr 12, 2015 15:04

I do incremental backups on my linux server using rsync, creating a backup directory for every day. Only changed files will be synchronized, unchanged files will be hardlinked to the location of the previous backup. That way unchanged files truly exist only once on the backup drive but every backup directory will show all files of that day.

From time to time I delete old backups. And that takes FOREVER. It is unbelievable how long a simple rm -rf call on one of those backup directories will take, especially if you take into account that most files won't really be deleted from the disk because they are still part of some older or newer backup. So only the filesystem entries of the hardlinks will get deleted.

To name some numbers: it takes 6:30 minutes to delete only ONE of those backup directories (containing about 1.6 million files each). My plan for today was to delete about 200 of those daily backup directories and keep only monthly backups. Thus, it would take almost 22 hours to delete those directories. What. The. Fuck!?

So I googled.

To my surprise I didn't find the all winning answer, proposing some simple command which would delete those directories in a snap. Apparently the problem is that most file systems will delete those files one by one and trigger a metadata update for every single file which may even result in the journal being written to disk for every delete. Yay.

But at least I found an alternative solution which is still quite slow but almost twice as fast as rm -rf:
mkdir /tmp/blank
rsync -a --delete /tmp/blank

This will take 3:20 minutes to delete one directory.

anger, computer

Previous post Next post
Up