Data carving


Data carving , file carving or raw file recovery is a very powerful way to extract files and data from corrupted or damaged media.


Data carving, or file carving is a process of reading files without reference to an file system. This is required when the media has been corrupted, changed, or the disk is very damaged. The technique can be applied to any type if disk that stores data on sector boundaries which includes camera memory as well as hard drives. It is based on the fact that most files start with a recognisable data signature

CnW Recovery software (
download here) recovers raw files in two different ways -
Unread sectors area scanned after a logical read of files, ie sectors will be scanned that have not been used for files or seen within the file system

Or as a complete disk scan, extracting all possible files. This will recover all files pointed to by the file system, including those that can be read as part of the operating system.

Extracting files from media, while ignoring the file structure is performed by looking for unique patterns in the data. Typically, the first few bytes of a file give a good indication, for instance all PK zip files start with the characters PK. The software has a (growing) list of such signatures built into the code. CnW Recovery goes much further than most recovery programs in that it will often generate a file name from meta data within the file. The list at the bottom of this page grows on a regular basis, so please contact us if a particular file type is required. Download the free
demo program now and see the files that can be recovered.

Advanced data carving



CnW has some advanced features that go beyond simple signature recognition. These include the list below. Each feature is dependent on the actual file format.
File verification
Generation of filenames based on content
File length verification
Reconstructing fragmented files - eg fragmented video recovery

How to read unallocated space?

With each logical file system, eg NTFS, CD-ROM there is an option to read the unallocated space. The sequence is that the disk is read logically, and hence all used sectors are known. After this, all previously unread sectors are read, and each sector, or cluster is tested to see if it is a possible start of a file. This based on the first few characters of a file, and then sometimes some rather more extensive tests looking for a certain type of data. For NTFS disks, any file compression will be detected and automatically expanded. It does not matter if just one file is compressed, or all files compressed, they will be detected. It will even detect NTFS compressed files left behind on a FAT, MAC, or HPOFS disk .(This is more than many data recovery programs will achieve).

One issue with raw recovery is that the start of the file can be easy to detect, but often the length is not clear. CnW Recovery software will often continue adding to a file until a new unique start is found. In these cases a file can be shown as many MBs, although the actual data is only 100K. Fortunately, in many cases, an application will read this file, and ignore erroneous data at the end. As the CnW Recovery software develops, where possible, files will be stored at the correct length. With continuous files, extraction rates can be extremely high, with fragmented files, there can be major problems. However, automatic file carving routines are being added merge fragments of common types of files to produce a compete readable file.

With some file types, it is possible to determine if the data is still a valid data stream, and if incorrect data is detected, no more data will be added to the potential output file.

The following files types are detected by CnW Recovery software. The list does grow on a regular basis, and we are happy to add new file types if sent details, and examples - please contact CnW with any requests. The number of file types being verified, and corrected is also increasing on a regular basis.

Fragmented files that may give problems with data carving
Data carving should always be considered the last resort on data recovery as it has severe limitations. These is no meta data, ie no name, date or valid size. There is also a big issue with fragmentation. Many files are not fragmented, some are almost always fragmented. In particular, email files such as .EDB and .PST. These are large files that grow over months or years - often to multi GB lengths. Disk defragmentation may not help and to recover a complete file with
data carving is unlikely. Video files and photos can also be fragmented, and CnW has several routines that will attempt to reconstruct files from fragments.

Terminating strings
Some data carving programs search for terminating strings as well as starting strings. CnW Recovery software does not use this technique as the results do not help. CnW will detect file starts and continue until a new file start is found. This can occasionally lead to files that are very large as they are padded with blank data. However, as part of CnW advanced carving techniques, many files are validated and the lengths determined by analysis after recovery

Data Carving and Reiser FS
Data carving does assume that all file start at the beginning of a sector (actually a cluster). Reiser uses a very efficient method of storing data and this very often means that a file starts in the middle of a sector, and only uses a partial amount of the sector. For this reason, knowledge of the file system is required to recover the files.

Deduplication
Once the carving process has been completed, the log can be viewed and files that have the same hash value can be deduplicated. It is very common for unallocated space to contain multiple copies of the same file.

File carving
Many applications describe this process as data carving or file carving. However, CnW takes it a few stages further. One unusual, and very useful feature is the creation of meaningful file names and many files are verified for content structure and length. This gives a much higher success rate than just a simple string match at the start of a file. For handling fragmented files, look at the sections on advanced data carving and manual data carving

Searching for strings
The forensic and commercial option enables multiple strings to be searched, either on their own, or whilst doing disk carving. Multiple tables of strings can be saved for later use.

File names
In addition to the list below, various files are recognised by content, rather than signature. An example is Macintosh eMails. Again, this list will grow. For some file such as Jpegs, Docs, MP3 and Zip files, often it is possible to add some file name details such as date of file.

By scanning the complete disk, it is very common for multiple instances of a single file to be recovered. This is due to the operating system moving files, or as a result of a defrag operation. Fortunately, these duplicates can be removed by using the Deduplication feature in the log.

Advanced data carving tools built into CnW Recovery software

Processing fragmented files

Smart data carving is the process of joining fragments of a file together. The requirement is when an operating system has failed and a file exists, but in several places on the drive (or memory chip). There are easy ways to detect the start of a file, and often the end of the file based on signatures, and know sequences of bytes. The problem with extracting a file correctly is discovering the sections in the middle of the file. The tools for this are part of CnW software.

More than just file signature

File starts can be detected, and for several types of file, the file is validated. This can be based on pointers found throughout a file, or may be just having a length pointer at the start, and validating the end of the file. For several file types, when the file extracted from start signature alone is deemed to be invalid, the data carving tools can be applied by selecting the Process Fragments option. The method of operation is different for each support logical format, but the overall scheme is as below.

When the Data carve function is performed, sectors that are part of known good files are marked as used. It is an important part of the data carve function to verify as many file types as possible. Once a file is determined as incomplete it is analysed to determine how much of the file is valid. This might be just the first few KB, or 90%. Routines are then used to search areas of the disk that are apparently not used. The data in these sections are then tested to see if they have suitable data. For instance, if a JPEG file was being repaired, a section of data with pure text could be skipped. Also, to improve the chances of good recovery, starting sectors of files recovered correctly are analysed. This is used to determine the cluster size and location of files. By just looking at complete clusters, file fragments can be verified for suitability.

Video Fragments and data carving

For files such as most AVI files data carving is fairly straight forward. The structure of the file has many sequential pointers and so it is straightforward to determine if the next pointer is at the correct location within a cluster. For non critical tasks, an AVI can also accept runs of unrelated data. The display may jump slightly, but overall the video will still display. However, there is a second version of AVI file where the camera stores the data first, followed by the header and indexes. A special recovery routine has been added to detect and recover such files. Files such as JPEGs are less friendly, and at time a JPEG may appear to be reconstructed, but in fact the image is a mosaic.

The case MP4 type files is more complex. For files stored on the hard drive, data carving often works, but files on the original memory chip, there can be big issues. CnW has a series of Wizard functions to do very intelligent data carving which even allow for GoPro cameras with interleaved high and low resolution video streams.

Forensic implications
Forensically, one has to be slightly cautious about automatic data carving, but for most users, the results are very useful. A recent test on a corrupted 2GB memory chip produced about 120 good files and 60 corrupted. After the data carving routine was run, about 40 corrupted files were reconstructed.

Carving on NTFS compressed disks
A very significant feature of CnW software is that as it processes NTFS compressed clusters it can carve files from NTFS compressed disks