FlipsideReality Once upon a time, in a land far far away…

9Dec/075

Howto replace duplicate files with hard links.

Like most people I have multiple backups of the same files, stored in an ad-hoc structure. I went hunting fir a good utility to remove duplicates, and replace them with hard links.

It surprised me that there is a tool for doing this on NTFS volumes under windows. Update: and another free one!

I found a perl script called trimtrees.pl You can find it in CPAN, it's describes itself such:

Traverse all directories named on the command line, compute MD5
checksums and find files with identical MD5. IF they are equal, do a
real comparison if they are really equal, replace the second of two
files with a hard link to the first one.

Special care is taken to cope with C error conditions.
The inode that is overbooked in such a way, is taken out of the pool
and replaced with the another one such that the minimum of files
needed is kept on disk.

The C< --maxlinks> option can be used to reduce the linkcount on all
files within a tree, thus preparing the tree for a subsequent call to
C. This operation can be thought of the reverse of the normal
trimtrees operation (--maxlinks=1 produces a tree without hard links).

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)
Comments (5) Trackbacks (0)
  1. That really hepled :) Thanks. I use this on my podcast directories now, because my original podcatcher is not intelligent enough to do so.

    The Starseeker

  2. This sounds like exactly what I am looking for, but CPAN doesn’t have that link. Searching on CPAN was no joy. I did find some scripts using a search engine, I think the second one it the one you used, since the URL looks similar. Looks like CPAN are changing the dir structure. http://cpansearch.perl.org/src/ANDK/Perl-Repository-APC-2.002/eg/trimtrees.pl

  3. i found duplicate finder 2009 the most advance version of duplicate file finder in the market for windows systems…

  4. Yet another freeware tool I found with this feature is Duplicate Cleaner (http://www.digitalvolcano.co.uk/content/duplicate-cleaner).

    P.S.
    The above post by asmkrt is probably spam. “Duplicate Finder 2009″ not freeware.

  5. Using a tool to find and remove duplicate files is an efficient way to manage files on computer! Thank you for the guys who developed this perfect tool. You know, now I can easily find and remove duplicate files on my computer!


Leave a comment


No trackbacks yet.