Q: Why even learn the structure of the executable?
A: Almost all malware we encounter is an executable file, and the PE file type (Windows) is probably the most common among malware. Knowing how this works is the basis of reverse engineering.
What is the PE Header
Computers are inherently dumb; they don’t know how to read, load, and execute a file without someone telling them how.
The PE header is how Windows executables specify all the information our operating system needs to load and execute files properly.
The PE header contains a lot of helpful information cramped into a small and compact structure, and thanks to Microsoft’s love for backward compatibility, there is some obsolete information in there, so let us navigate the structure together.
Header Deep Dive
Starting from the DOS header, a structure the predates the PE header, we will go over the different parts. Before we do so, I wanted to share the absolute best infographic i’ve ever seen on the subject – https://www.openrce.org/reference_library/files/reference/PE%20Format.pdf
Dos Header
Used since the MS-DOS times, it is mostly legacy and obsolete, but there are two fields we should focus on:
- e_magic – the magic identifier of the Windows executable, should contain the letters “MZ”; thus some people call Windows executables MZ executables; it stands for Mark Zbikowski, the creator of MS-DOS
- e_ifanew – pointer towards the new header format – NT header.
NT Header
In 1993, Microsoft introduced its New Technologies architecture (Windows NT 3.1), which also introduced an addition that replaced the DOS Header, the NT Header.
- Signiture – The magic identifying value of “PE”
- FileHeader – I think about this header as a metadata header, describing what the file is supposed to run on, when the files was created, etc. Holds alot of interesting information for techincal threat intelligence.
- OptionalHeader – Holds all the technical informatiom that descibes the OS how to load the files, where the code starts, what libraries are needed, etc.
OptionalHeader
For something so optional, it seems too important.
At a quick glance, this is information overload at its core, so let us simplify this and focus on some fields that are interesting and we all should know about:
- Magic – Specifies what bittnes the program can run on, meaning is this executable for 32-bit CPUs or is it for 64-bit CPUs
- ImageBase – The Expected base address of the memory space the program wants to be loaded at, due to virtual addressing and randomization; this isn’t always the case.
- AddressOfEntryPoint – also referred to as Relative Virtual Address (RVA) – The relative address of the program entry point (relative to the ImageBase)
- SizeOfImage – The amount (in bytes) needed to be allocated for the files to load
- DataDirectory – An array of structures that hold very important lists we will cover in a moment
DataDirectory
Every DataDirectory entry is a struct containing the relative address of its start, and the size of it, each DataDirectory had a different meaning –
Index | Name | Meaning |
0 | IMAGE_DIRECTORY_ENTRY_EXPORT | Export Directory – All the functions visable for other execuatables that want to run the file. |
1 | IMAGE_DIRECTORY_ENTRY_IMPORT | Import Directory – All the functions the executable needs to be able to run. |
2 | IMAGE_DIRECTORY_ENTRY_RESOURCE | Resource Directory – Resources like icons |
3 | IMAGE_DIRECTORY_ENTRY_EXCEPTION | Custom Exception Handlers |
4 | IMAGE_DIRECTORY_ENTRY_SECURITY | If the file is digitally signed, this is where it is stored |
5 | IMAGE_DIRECTORY_ENTRY_BASERELOC | Used for relocating the base address, allows the OS to load the executable to any address it selects |
6 | IMAGE_DIRECTORY_ENTRY_DEBUG | Debug information if left behind by the developers |
7 | IMAGE_DIRECTORY_ENTRY_COPYRIGHT | |
8 | IMAGE_DIRECTORY_ENTRY_ARCHITECTURE | Reserved, Must be 0 |
9 | IMAGE_DIRECTORY_ENTRY_GLOBALPTR | |
10 | IMAGE_DIRECTORY_ENTRY_TLS | Essentially a list of functions to call before running the main entry point, watch out for that in malware. |
11 | IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG | |
12 | IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT | |
13 | IMAGE_DIRECTORY_ENTRY_IAT | Will be populated at runtime with the actuall addressed of all the functions from the Import directory |
14 | IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT | |
15 | IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR | Used in .NET files. |
Conclusion
The PE Structure is a huge web of connections, and I am not able to cover this in a single post, but hopefully, this gave you a small taste of the format. You don’t need to know all of this by heart, but you need to know it exists.
A vast amount of malware uses tricks like the TLS functions that run prior the entry point, or add a custom exception handler that runs malicious code.
And knowing this structure exists and what i can hold helps deal with theese tricks.
I highly suggest you dive deeper into the structure, try to create a parser that takes a PE file and analyzes it, or try to play with the Import table. I remember one of the first things I’ve done is to take a simple program with a while loop that prints and then sleeps for 5 seconds. Playing with the import table and the IAT table, I was able to disable the sleep and give the program some coffee 🙂