File hashing

Recovering images based on MD5 hash values

As part of an investigation it is useful to be able to select images based on known MD5 values. CnW software has this option for both working disks, and when an image is recovered from a raw scan.

Part of the file filter is the option to select a table of hash values. For police investigations this table may be filled with known ‘dubious’ hash values. The structure of the hash table is a sorted ASCII text file of all hash values required to be tested. The filter can then be set to skip all files except those that match the hashes. As the hash is not file name dependent, any renaming of files will be ignored.

CnW Recovery software has another very powerful feature that relates to hashing and raw imaging. Many recovery programs will correctly find JPEG files but not set the correct length. CnW, when it finds a JPEG file, it will calculate the correct file size, and then recalculate the hash value. There is also a mode where JPEGs will be found on disks that had NTFS compression enabled.

NSRL Hash tables
The NSRL publish tables of hash values for known good files. If while doing a recovery each file hash value is tested, then these files can be skipped. Because the hash value is recognised, it means that the file has not been changed in any way, and so does not need investigating. ie it will not have any hidden data or be a data file disguised as something else. CnW software (Forensic version) has a function to download and import these hash tables for use with the file filter.

The log, when the forensic option has been purchased, will display all files matching, along with their MD5 value.

Which hashing routine to use?
CnW creates hash values in both MD5 and SHA-256. For processing, for instance file filtering and deduplication, only MD5 is used. So why have both values. The reason for both is there have been various attacks on both MD5 and SHA-1 and examples have been produced that do show two different files with the same value. This is exceptionally hard to do, and never happens in practice. It is also certain that if a file is corrupted in transmission, or storage, the hash value will also change. The main reason for a hash value is to prove that a file has not been changed in any way since produced.

Technology is moving very quickly, and it is possible that in several years time routines and processors may beable to modify files and still retain the original hash value. For this reason, as cases can take several years to get to court, CnW has added the second SHA-256 hash value, much longer, and much more secure. MD5 though still remains extremely safe for all requirements.