Continuing from where we left off, let's examine USNF.EXE in a hex editor, to get a better understanding of the file structure. The file command identifies the executable as MS-DOS executable. Loading into a hex editor:
DOS exe header
A good description of the DOS exe header can be found at http://www.delorie.com/djgpp/doc/exe/
|0x00||word||0x5a4d||ASCII characters 'MZ'.|
|0x02||word||0x0150 (336 decimal)||Number of bytes in the last block of the program that are actually used. Zero means the entire last block is used (effectively 512).|
|0x04||word||0x000d (13)||Number of 512 byte blocks in the file that are part of the exe.|
|0x06||word||0x005d (93)||Number of relocation entries.|
|0x08||word||0x001c (28)||Number of 16 byte paragraphs in the header. This is the start of the program data.|
|0x0a||word||0x0007 (7)||Minimum number of 16 byte paragraphs reqiured by the program.|
|0x0c||word||0xffff||Maximum number of 16 byte paragraphs requested by the program.|
|0x0e||word||0x0176||Relative value of the stack segment. *Added to the segment the program was loaded at.|
|0x10||word||0x0064||Initial value of the SP register.|
|0x12||word||0x0000||If set, the 16 bit sum of all words in the file should be zero. Normally not filled in.|
|0x14||word||0x0000||Inital value of the IP register.|
|0x16||word||0x0020||Relative value of the code segment. Added to the segment the program was loaded at.|
|0x18||word||0x0040||Offset of the relocation table (relative to the start of the file).|
|0x1a||word||0x0000||Overlay number. Zero indicates it's the main program.|
Finding the start of the program data:
The word at offset 0x08 tells us the size of the MZ header (0x1c or 28 paragraphs). Thus the start of program data will be offset 0x1c0 or 448 bytes (28 x 16) from the start of the file. Indeed we can find a copyright message at this location.
Finding the program entry point:
The words at offsets 0x14 and 0x16 give us the offset and (relative) segment of the program entry point respectively. To find the entry point in the program, add the (relative) segment (0x20) paragraphs, or 0x200 bytes, to the start of program data, then add the initial value of the IP register (zero in our case). We should find the program entry point at offset 0x3c0.
Calculating the program size:
The words at offsets 0x02 and 0x04 tell us the program is 12 full 512 byte blocks, plus 336 bytes in the final block. This gives a program size of (12 * 512) + 336 = 6,480 bytes. Adding this to the start offset (0x1c0) gives the program end at offset 0x1b10 (or 6928 decimal). Jumping to that location in the file, we find a bunch of zeroes, but there is something interesting at 0x1b50, that could be a windows PE header. We'll look at it in more detail later.
The word at offset 0x18 tells us the start of the relocation table, in our case 0x40. The word at offset 0x06 tells us the number of relocation entries (93). Each relocation entry is four bytes and consists of a 16 bit offset followed by a 16 bit segment. For each entry, the loader adds the start segment address to the word value pointed to by the segment:offset pair.
Calculate the start segment address:
We know the start of the program data (based on the word at offset 0x08) is at offset 0x1c0 from the start of the file. To find out where it was loaded into memory we can subtract the word at offset 0x16 (0x20) from the initial value in the CS register (0x020D). This gives a start segment address value of 0x01ED, representing the location the program was loaded into memory.
Let's examine the first two relocation entries:
Adding the offsets to the start of the program data, we find the file contains zeroes at offsets 0x1fc and 0x200. Comparing the memory view of the corresponding location in the loaded file (01ED:003C and 01ED:0040), we can confirm the zeroes have been altered to 0x01ED.
In fact, it appears that the bytes starting at 0x003C to 0x004F consist of five segment:offset pairs, all pointing to locations within the starting segment.
Let's look at another, non-trivial example, near the program entry point:
Look closely at the instruction at 020D:0002:
9A 02 00 7C 02 call 027C:0002
Compared with the original bytes in the source file:
9A 02 00 8F 00
Note the value 0x008F at address 020D:0005 has 0x1ED added to it, as expected to become 0x027C as part of the call instruction. Let's find the relocation entry corresponding to this address.
Look at offset 0x0054 from the beginning of the file:
Remember, our file was loaded into segment 0x01ED. Adding 0x0020 to the segment gives us 0x020D (the initial value of the CS register), and the 0x0005 offset leaves us the adjusted value in our call instruction!
Now we understand how the relocation table works, take a close look at the value of the relocation segments. We've already seen the five instances from 0x0000, and there's a few instances from 0x0020 and 0x0027, but the majority are from segment 0x008F. This gives us an idea about the structure of the executable code in the file.
Using the information we've discovered, we can map out the file structure.