My Road to Digital Forensics Excellence

File Carving Experiments, Part 3

Posted by Paul Bobby on November 4, 2008

When engaging in a manual file carve (or at least a lengthy file carving session that requires an extensive manual component), I wish there was a way to remove those areas of the UA that are verified as part of a carved file; that way the UA gets smaller and reduces the amount of raw data to review. Until then I have to settle for simply highlighting portions of the UA in EnCase and bookmarking it, just to give it some color. However highlighting large sections of the text/hex view becomes cumbersome and prone to error, and so when I perform this task I use a custom text style:


You can modify this to parse the text at sector size, or 1024 bytes or whatever. Makes the highlighting much easier.


After carving out those files that come with specific file footers, my next approach is to carve out those files for whose length can be extracted from the file header sections. The best example of this are Executable files (including .exe, .dll, .sys and .ax, and probably many others I’m not aware of).

File Finder found 33 hits. It found so many hits because the Executable file signature is simply ‘MZ’, and the file finder enscript does not validate the EXE (that is interpret the header data), and searches the entire UA. The next best approach is to extract a static filesize for each search hit – and it turns out that if you export 1megabyte of data, but the executable is only, say, 64k in size, the executable will still run just fine. The downside of this approach is that the File Finder enscript searches the entire UA, and so the above 33hits created 33 single megabyte files. There was only one to begin with.

The Cluster Boundary search was much more successful. Resulted in the two hits below:


Notice the 0xFF’s before the first ‘MZ’ hit (that came from our deliberate wipe). The second hit is in the preview there, but was found within the 32bytes pulled from the cluster boundary, and didn’t actually occur on the boundary itself. This is a limitation of my code and will be improved upon.

So how to carve out the executable found at line 1 above? There’s a couple of ways to do it.

  1. Simple Export to Binary (from the search hit) and export, say, 1 megabyte of data (sound familiar?). If the executable is less than 1meg in size, ta-da, you have it.
  2. Change the text view to display 4096bytes at a time, or multiple sectors at a time, and highlight consecutive lines of text until you ‘think’ you have the next executable. Export to binary and test.
  3. Use the PE Extractor enscript I wrote three articles ago 🙂

Approach #3 produces an executable we can be sure to get an MD5 match if we have samples to compare with, and also we can color off this section of the UA accurately without worrying about missing other files to carve.


A PDF file appears to end with %%EOF, so I’m surprised that it is not included in the File Finder enscript.

The File Finder enscript, using the default approach, found 286hits and created 286 single megabyte files. In order to find the only PDF files worth testing, I cheated and compared the PDF search hits to the starting locations of the PDFs on the unformatted drive. The first PDF produced a Corruption error when viewed (in Foxit Reader), and the second and third hits both produced the same PDF!, namely dvd-quote.pdf. None of the PDF hits were of the Forensics-Paper-2.pdf – this might be a bug.

I considered adding the PDF Header and Footer to the File Finder enscript module, much like the header/footer for JPGs discussed previously. However the ‘%%EOF’ footer occurs multiple times within a PDF, and so this approach would fail. The Scalpel utility gets around the issue by assuming all PDFs are 5megabytes maximum, and carving up to and including the last %EOF footer hit.

My Cluster Boundary search produced 4 search hits as below (the 0xFF wipe pattern is handy)


(1609728/512) + 3991 = 7135 (same as Forensics-GCFA-Paper.pdf)

(1699840/512) + 3991 = 7311 (coincidental file signature hit on a cluster boundary, in the middle of an existing PDF)

(2580480/512) + 3991 = 9031 (same as Forensics-Paper-2.pdf)

(3276800/512) + 3991 = 10391 (same as dvd-quote.pdf)

How to manually extract? Again, taking the manual approach to highlight and export is probably the best bet. The Scalpel approach of carving out until you find the last %%EOF may work. Another innovative approach is to carve until you come across a series of 0x00s. RAM Slack is no longer that, to the best of my knowledge, since at least Windows XP, the OS deliberately writes 0x00s to fill up a sector and not data from RAM.

What’s next?

I’ve a lot of work still to do. There’s the whole thing of fragmented files (I’m still getting my head around some of the theory I’ve read in DFWRS papers), extracting other file types based on parsing header contents (a la Executable files), and I’ve still to test carving out the remaining file types from my test case.

Maybe there will be a part 4, 5 and 6…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: