Data Carving
Previous Topic  Next Topic 

Home


Data carving is an important tool when attempting to recover files from either unallocated drive space, or from a disk that has become very corrupted.  It is based on the CnW disk imaging routine, but goes several stages further and will attempt to automatically reconstruct files that are fragmented.  It can be slow, but if a file is critical, it is well worth while, and quicker than trying to process by hand.  Very few data recovery programs can recover fragmented files when the operating system details have been lost or corrupted - CnW often succeeds with this process.  Forensically it logs the start of every file it finds, and the fragments when working on a reconstructed JPEG.


Recovery based on signature alone often works extremely well, but if a file is fragmented, then the recovered file will not be valid.


The approach that CnW Recovery takes is to do a standard raw recovery, based on signature and headers to track down sequential files. These files are then verified, and when the verification indicates a complete, and valid file, the space they occupy on the drive is marked as used. This helps reduce the number of sectors that have to be searched for other file fragments.  Thus the program builds up an internal map of areas of the disk where fragments may be found, and areas where the data is know and in effect allocated.


The actually recovery of files to be on a file type by file type basis.  In raw recovery, the operating system gives no assistance as to where the fragments are stored, although the above procedures assist in helping lock out areas. Hence data carving does rely on a lot of trial an error.


Success rate does often depend on the mix of files on a disk or memory chip.  If a single JPEG is fragmented, and all other files are XML files, joining fragments is easy.  If there are a lot of Word Doc files, all fragmented, it is very easy to get false matches


To reconstruct a file, first the starting stub is required.  This is typically found by a signature, for instance a JPEG file always starts 0xFF 0xD8, 0xFF and the a 0xE0 or 0xE1.  After the inital header blocks, a JPEG file is  made up of sectors of compressed data.  The recovery routine can therefore ensure that the possible sectors to add are compressed data, and then verify if the additional data still makes the inmcomplete JPEG file consistant.  This process is continued until a complete file is constructed, or it is determined that this process is not working.


CnW Recovery software has automatic routines using data carving methods for files such as JPEG and AVI files.  This list will grow on a regular

basis.


JPEG Carving

JPEG carving is paricually useful with camera memory chips. Once a raw recovery is performed with a memory chip, often there are several images that do not open.  If they are fragmented, then the process fragments function will normally reconstruct between 25% and 75% of these images.  The routine works best when different fragments of the image are sequential, though not consecutive.  If sections of the file are missing - because they have been overwritten - no data carving will work.  CnW does not attempt to insert data to fix the image.


AVI Carving

AVI's have a structure that is very tolerant to carving. They are also tolerant to sections of corrupted data as long as critical pointers are valid. CnW software will therefore join fragments together to make a valid routine. Development is taking place so that when the end of an AVI cannot be found, a suitable trailer record will be added so that the reconstructed AVI will open and play.


DOC carving

Word documents do not respond to simple data carving techniques very well.  There are several pointers in the file that can be used but the recovery rate using automatic routines is not very high.


PSD  (Photoshop) carving

PSD files do have a lot of records, all with embedded lengths, and a simple 4 byte tag.  It is therefore possible to step through the file validating, and predicting where the next tag must be.  Knowing that tag must be at a certain location within a cluster, it is normally possible to locate the correct cluster