9
Dec/07
1

Howto replace duplicate files with hard links.

Like most people I have multiple backups of the same files, stored in an ad-hoc structure. I went hunting fir a good utility to remove duplicates, and replace them with hard links.

It surprised me that there is a tool for doing this on NTFS volumes under windows. Update: and another free one!

I found a perl script called trimtrees.pl You can find it in CPAN, it’s describes itself such:

Traverse all directories named on the command line, compute MD5
checksums and find files with identical MD5. IF they are equal, do a
real comparison if they are really equal, replace the second of two
files with a hard link to the first one.

Special care is taken to cope with C error conditions.
The inode that is overbooked in such a way, is taken out of the pool
and replaced with the another one such that the minimum of files
needed is kept on disk.

The C< --maxlinks> option can be used to reduce the linkcount on all
files within a tree, thus preparing the tree for a subsequent call to
C. This operation can be thought of the reverse of the normal
trimtrees operation (–maxlinks=1 produces a tree without hard links).

Comments (1) Trackbacks (0)
  1. starseeker
    2:27 pm on February 14th, 2008

    That really hepled :) Thanks. I use this on my podcast directories now, because my original podcatcher is not intelligent enough to do so.

    The Starseeker

    [Reply]

Leave a comment

No trackbacks yet.

Close
E-mail It
Socialized through Gregarious 42