Mar 30, 2010 01:27
I'm clearing up some hard drive space, because my goals require it.
I noticed that I have loads of duplicate files around. Sometimes it's a matter of me dumping photos on my hard drive more than once, "just in case." Sometimes I sort those photos into folders on different drives, but forget to delete the originals. In any case, swaths of duplicated files.
Googling for a tool to help me with the problem, I discovered fdupes. Download, install, okay. I read the man page, and feed it the directories I want to search for dupes in.
valacosa@Bastion:~$ fdupes -r /media/GAMES\ 2/DCIM-Delete\ After\ Backup/ /media/PHOTO1/█
It displays its progress. Excellent. But when it reached 100%, it just spat out a bunch of text.
--- snip ---
/media/GAMES 2/DCIM-Delete After Backup/153CANON/MVI_5328.THM
/media/PHOTO1/Camera Dump 02/2005-07-10--Sci Orientators Orientation/MVI_5328.THM
/media/GAMES 2/DCIM-Delete After Backup/153CANON/MVI_5328.AVI
/media/PHOTO1/Camera Dump 02/2005-07-10--Sci Orientators Orientation/MVI_5328.AVI
/media/GAMES 2/DCIM-Delete After Backup/153CANON/IMG_5327.JPG
/media/PHOTO1/Camera Dump 02/2005-07-10--Sci Orientators Orientation/IMG_5327.JPG
/media/GAMES 2/DCIM-Delete After Backup/153CANON/IMG_5326.JPG
/media/PHOTO1/Camera Dump 02/2005-07-10--Sci Orientators Orientation/IMG_5326.JPG
/media/GAMES 2/DCIM-Delete After Backup/153CANON/IMG_5325.JPG
/media/PHOTO1/Camera Dump 02/2005-07-10--Sci Orientators Orientation/IMG_5325.JPG
--- snip ---
It all scrolled by so quickly on the terminal screen, what seemed like hundreds of lines of output.
"Wait," I thought to myself. "I know how to solve this problem. I'll just shunt the output into a text file."
valacosa@Bastion:~$ fdupes -r /media/GAMES\ 2/DCIM-Delete\ After\ Backup/ /media/PHOTO1/ > dupes.txt█
And it works. I open the text file in gedit and start rifling through it. Turns out the command spat out over 10,000 lines of output. I start deleting files, one by one. "Ugh, there has to be a better way to do this," I thought. I read the man pages for fdupes again; it turns out fdupes itself can delete files. Great.
But the delete files option works something like this:
[1] /media/PHOTO1/DCIM/HUNT/IMG_6158.JPG
[2] /media/PHOTO1/Camera Dump 02/2006-01-14--SciGames - Day Four/IMG_6158.JPG
Set 1 of 3414, preserve files [1 - 2, all]: 2
[-] /media/PHOTO1/DCIM/HUNT/IMG_6158.JPG
[+] /media/PHOTO1/Camera Dump 02/2006-01-14--SciGames - Day Four/IMG_6158.JPG
[1] /media/PHOTO1/DCIM/HUNT/IMG_6157.JPG
[2] /media/PHOTO1/Camera Dump 02/2006-01-14--SciGames - Day Four/IMG_6157.JPG
Set 2 of 3414, preserve files [1 - 2, all]: 2
[-] /media/PHOTO1/DCIM/HUNT/IMG_6157.JPG
[+] /media/PHOTO1/Camera Dump 02/2006-01-14--SciGames - Day Four/IMG_6157.JPG
[1] /media/PHOTO1/DCIM/HUNT/IMG_6155.JPG
[2] /media/PHOTO1/Camera Dump 02/2006-01-14--SciGames - Day Four/IMG_6155.JPG
Set 3 of 3414, preserve files [1 - 2, all]: █
I couldn't even type "a", I had to type out "all" if I wanted to keep everything. Ugh. There had to be a better way to do that. So instinctively, I reached for the GUI.
I downloaded and installed fslint, which promised a GUI. And that it had. I pointed it at the directories I wanted to search, and let it do its thing.
And waited.
And waited.
See, the damndest thing was, the program didn't even give me any progress bars! It could be working, or it could be hung. I had no way of knowing. Also, I read fdupes compared MD5 hashes; fslint could be doing something sillier for all I knew. So I got impatient, and killed it. And then, what I was supposed to do all along dawned on me. I wasn't supposed to use GUI here. I'm supposed to use the power of Linux: chaining and piping.
So I ended up doing something like this:
valacosa@Bastion:~$ cat dupes.txt | grep 152CANON | wc -l
105
valacosa@Bastion:~$ ls /media/GAMES\ 2/DCIM-Delete\ After\ Backup/152CANON/ | wc -l
105
valacosa@Bastion:~$ cat dupes.txt | grep 153CANON | wc -l
98
valacosa@Bastion:~$ ls /media/GAMES\ 2/DCIM-Delete\ After\ Backup/153CANON/ | wc -l
98
valacosa@Bastion:~$█
If I know all the files in the folder are also listed in the dupes.txt file I generated, I can just deleting the redundant directories as I go along.
Now I don't know if I should explain what a command like ls /media/GAMES\ 2/DCIM-Delete\ After\ Backup/152CANON/ | wc -l is doing. I get the feeling that those who care already know, while those that don't care have already stopped reading this post.
geek,
linux