0x1 – Getting to know the PE


Q: Why even learn the structure of the executable?
A: Almost all malware we encounter is an executable file, and the PE file type (Windows) is probably the most common among malware. Knowing how this works is the basis of reverse engineering.

What is the PE Header

Computers are inherently dumb; they don’t know how to read, load, and execute a file without someone telling them how.
The PE header is how Windows executables specify all the information our operating system needs to load and execute files properly.

The PE header contains a lot of helpful information cramped into a small and compact structure, and thanks to Microsoft’s love for backward compatibility, there is some obsolete information in there, so let us navigate the structure together.

Header Deep Dive

Starting from the DOS header, a structure the predates the PE header, we will go over the different parts. Before we do so, I wanted to share the absolute best infographic i’ve ever seen on the subjecthttps://www.openrce.org/reference_library/files/reference/PE%20Format.pdf

Dos Header

Used since the MS-DOS times, it is mostly legacy and obsolete, but there are two fields we should focus on:

  1. e_magic – the magic identifier of the Windows executable, should contain the letters “MZ”; thus some people call Windows executables MZ executables; it stands for Mark Zbikowski, the creator of MS-DOS
  2. e_ifanew – pointer towards the new header format – NT header.

NT Header

In 1993, Microsoft introduced its New Technologies architecture (Windows NT 3.1), which also introduced an addition that replaced the DOS Header, the NT Header.

  1. Signiture – The magic identifying value of “PE”
  2. FileHeader – I think about this header as a metadata header, describing what the file is supposed to run on, when the files was created, etc. Holds alot of interesting information for techincal threat intelligence.
  3. OptionalHeader – Holds all the technical informatiom that descibes the OS how to load the files, where the code starts, what libraries are needed, etc.

OptionalHeader

For something so optional, it seems too important.

At a quick glance, this is information overload at its core, so let us simplify this and focus on some fields that are interesting and we all should know about:

  1. Magic – Specifies what bittnes the program can run on, meaning is this executable for 32-bit CPUs or is it for 64-bit CPUs
  2. ImageBase – The Expected base address of the memory space the program wants to be loaded at, due to virtual addressing and randomization; this isn’t always the case.
  3. AddressOfEntryPoint – also referred to as Relative Virtual Address (RVA) – The relative address of the program entry point (relative to the ImageBase)
  4. SizeOfImage – The amount (in bytes) needed to be allocated for the files to load
  5. DataDirectory – An array of structures that hold very important lists we will cover in a moment

DataDirectory

Every DataDirectory entry is a struct containing the relative address of its start, and the size of it, each DataDirectory had a different meaning –

IndexNameMeaning
0IMAGE_DIRECTORY_ENTRY_EXPORTExport Directory – All the functions visable for other execuatables that want to run the file.
1IMAGE_DIRECTORY_ENTRY_IMPORTImport Directory – All the functions the executable needs to be able to run.
2IMAGE_DIRECTORY_ENTRY_RESOURCEResource Directory – Resources like icons
3IMAGE_DIRECTORY_ENTRY_EXCEPTIONCustom Exception Handlers
4IMAGE_DIRECTORY_ENTRY_SECURITYIf the file is digitally signed, this is where it is stored
5IMAGE_DIRECTORY_ENTRY_BASERELOCUsed for relocating the base address, allows the OS to load the executable to any address it selects
6IMAGE_DIRECTORY_ENTRY_DEBUGDebug information if left behind by the developers
7IMAGE_DIRECTORY_ENTRY_COPYRIGHT
8IMAGE_DIRECTORY_ENTRY_ARCHITECTUREReserved, Must be 0
9IMAGE_DIRECTORY_ENTRY_GLOBALPTR
10IMAGE_DIRECTORY_ENTRY_TLSEssentially a list of functions to call before running the main entry point, watch out for that in malware.
11IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG
12IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT
13IMAGE_DIRECTORY_ENTRY_IATWill be populated at runtime with the actuall addressed of all the functions from the Import directory
14IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT
15IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTORUsed in .NET files.

Conclusion

The PE Structure is a huge web of connections, and I am not able to cover this in a single post, but hopefully, this gave you a small taste of the format. You don’t need to know all of this by heart, but you need to know it exists.


A vast amount of malware uses tricks like the TLS functions that run prior the entry point, or add a custom exception handler that runs malicious code.

And knowing this structure exists and what i can hold helps deal with theese tricks.
I highly suggest you dive deeper into the structure, try to create a parser that takes a PE file and analyzes it, or try to play with the Import table. I remember one of the first things I’ve done is to take a simple program with a while loop that prints and then sleeps for 5 seconds. Playing with the import table and the IAT table, I was able to disable the sleep and give the program some coffee 🙂


Leave a Reply

Your email address will not be published. Required fields are marked *