Deduplication

Deduplicating identical files from a recovered disk

When recovering files, or in particular doing a forensic investigation, it is very common for there to be duplicate entries of the same file, or photograph. Typically, there is little point in keeping multiple instances of the same file.

CnW has a very simple feature that will allow duplicates to be removed. The feature is found in the log, and any log showing recovered files can be deduplicated. As always with CnW software, the duplicate files are removed from the recovered drive, and never from the original drive. Once the DeDup function has been processed, the file entry for a file that has been removed will be marked as DeDup in the status column.

Deduplication works by using the MD5 hash value for each file. No two files ever have the same hash value, and so this represents a true digital signature. The name and date of the file are not relevant, so it is just the contents. Thus two files with very different names can have the same contents, so will be deduplicated, while two files with the same name, but just one byte different will not be deduplicated


The screen shot above shows the log once it has been deduplicated. As can be seen, file with different names are identical, and so one has been removed.

A very useful time to use this function is when doing a raw recovery. For instance, a disk may often contain multiple instances of photos, where only one is required, and with this function all instances are then deleted. If a disk is copied in the mode where good files are recovered and then the disk spaced for unallocated space, the unallocated files will be indicated by the status value of ‘Rec’vd’ When de duplicating, the unallocated space recovered files will be deleted in preference to files found in the logical area of the disk.

Another use of the hash MD5 value is made use of in the file filter by eliminating system files. One can download tables of MD5 values for known operating system files, and general applications. By using this with the file filter it does the equivalent of deduplicating any known good files.