Okay, so the title is a bit of an exaggeration, but still the task that I managed to do today was something that I had never believed in my wildest dreams that I would be able to accomplish. I just learnt how to recover deleted files using just about hundred lines of C code. Yes! you heard (or rather read) it right. All this was made possible due to the amazing CS50x MOOC that I am taking on edX, an online learning platform.
CS50x is the edX version of the popular CS50 course of the Harvard University, taught by Prof. David J. Malan who is an exceptionally great teacher and one of the best instructor that I have ever had the pleasure of learning from. So as part of the 5th Problem Set (more on first five later), we (who we? the people who are taking the course, duh!) were required to retrieve about 50 jpeg images, which had supposedly been deleted “by mistake”. To solve this problem, some basic knowledge of the JPEG file format, FAT file system and how the images are stored in a CompactFlash (CF) card are stored is required.
So we all know (well at least most of us) that an image is nothing but a large number of pixel values that are stored in some specific order with a specific format. It turns out that the JPEGs too have patterns of bytes that distinguish them from other file formats. In fact, most JPEGs begin with one of two sequences of bytes. Specifically, the first four bytes of most JPEGs are either
0xff 0xd8 0xff 0xe0
0xff 0xd8 0xff 0xe1
from first byte to fourth byte, left to right. Odds are, if you find one of these patterns of bytes on a disk known to store photos then that means that they de-mark the start of a JPEG.
Also it turns out that when you delete a file in a FAT file system, it is not actually erased from memory. What actually happens is that the system modifies the filename’s first character in the file’s directory entry to signal that the file has been deleted and that the directory entry can be recycled. Second, the system moves all of the file’s FAT clusters to the hard drive’s list of free clusters. Thus, as you might guess, we can easily recover any deleted files by scavenging through the file-system for some specific byte pattern.
You can read more about it from here : http://d2o9nyf4hwsci4.cloudfront.net/2013/fall/psets/5/garfinkel.pdf
Furthermore, digital cameras tend to store photographs contiguously on CF cards, whereby each photo is stored immediately after the previously taken photo. Accordingly, the start of a JPEG usually demarks the end of another. However, digital cameras generally initialize CF cards with a FAT file system whose “block size” is 512 bytes (B). The implication is that these cameras only write to those cards in units of 512 B. A photo that’s 1 MB (i.e., 1,048,576 B) thus takes up 1048576 ÷ 512 = 2048 “blocks” on a CF card. But so does a photo that’s, say, one byte smaller (i.e., 1,048,575 B)! The wasted space on disk is called “slack space.” Forensic investigators often look at slack space for remnants of suspicious data.
So now lets come to our problem. According to the problem statement, we must read a *.raw format file, which is nothing but a “forensic image” of the card and store its contents, byte after byte. We just open the file using the fopen(…) function and then read through it using the fread(…) function. Also we read in 512 bytes into the buffer at a time (because a block is stored as 512 bytes in FAT file-system) and then check if the starting 4 bytes represent the start of a JPEG file. If they do, then we have found an image and we write all the block’s data into a file using fwrite(…), and continue writing the next blocks into the same file until we encounter another block which starts with JPEG specific 4 byte format. Since our CF card stores images contiguously as mentioned earlier, we need not worry about random files and garbage values coming in the middle of two blocks. When we find a another pattern that signifies an new jpeg image, we close our present file and create a new file with a new name, and start writing the data into it until we encounter another JPEG start specific byte pattern.
Continuing this way, we will reach the end of the file, at which point we must close our input *.raw file and our newly written *.jpeg file.
If you do everything right (as I did :P), you will hopefully end up recovering your deleted images. Does’nt it sound too easy? Well, it actually is. It took me less than two hours to go through the entire problem statement for the Forensics assignment and complete it. Also it was the most interesting piece if code that I had ever written (I am not a Computer Science student, and no I don’t code all day). And having done that, I am now going to insert my camera memory card into my computer, delete all the images and then hopefully recover them successfully. So have fun trying this yourself and as they say, “This is CS50”.