0x1 – Getting to know the PE

Q: Why even learn the structure of the executable?
A: Almost all malware we encounter is an executable file, and the PE file type (Windows) is probably the most common among malware. Knowing how this works is the basis of reverse engineering.

What is the PE Header

Computers are inherently dumb; they don’t know how to read, load, and execute a file without someone telling them how.
The PE header is how Windows executables specify all the information our operating system needs to load and execute files properly.

The PE header contains a lot of helpful information cramped into a small and compact structure, and thanks to Microsoft’s love for backward compatibility, there is some obsolete information in there, so let us navigate the structure together.

Header Deep Dive

Starting from the DOS header, a structure the predates the PE header, we will go over the different parts. Before we do so, I wanted to share the absolute best infographic i’ve ever seen on the subject – https://www.openrce.org/reference_library/files/reference/PE%20Format.pdf

Dos Header

Used since the MS-DOS times, it is mostly legacy and obsolete, but there are two fields we should focus on:

e_magic – the magic identifier of the Windows executable, should contain the letters “MZ”; thus some people call Windows executables MZ executables; it stands for Mark Zbikowski, the creator of MS-DOS
e_ifanew – pointer towards the new header format – NT header.

NT Header

In 1993, Microsoft introduced its New Technologies architecture (Windows NT 3.1), which also introduced an addition that replaced the DOS Header, the NT Header.

Signiture – The magic identifying value of “PE”
FileHeader – I think about this header as a metadata header, describing what the file is supposed to run on, when the files was created, etc. Holds alot of interesting information for techincal threat intelligence.
OptionalHeader – Holds all the technical informatiom that descibes the OS how to load the files, where the code starts, what libraries are needed, etc.

OptionalHeader

For something so optional, it seems too important.

At a quick glance, this is information overload at its core, so let us simplify this and focus on some fields that are interesting and we all should know about:

Magic – Specifies what bittnes the program can run on, meaning is this executable for 32-bit CPUs or is it for 64-bit CPUs
ImageBase – The Expected base address of the memory space the program wants to be loaded at, due to virtual addressing and randomization; this isn’t always the case.
AddressOfEntryPoint – also referred to as Relative Virtual Address (RVA) – The relative address of the program entry point (relative to the ImageBase)
SizeOfImage – The amount (in bytes) needed to be allocated for the files to load
DataDirectory – An array of structures that hold very important lists we will cover in a moment

DataDirectory

Every DataDirectory entry is a struct containing the relative address of its start, and the size of it, each DataDirectory had a different meaning –

Index	Name	Meaning
0	IMAGE_DIRECTORY_ENTRY_EXPORT	Export Directory – All the functions visable for other execuatables that want to run the file.
1	IMAGE_DIRECTORY_ENTRY_IMPORT	Import Directory – All the functions the executable needs to be able to run.
2	IMAGE_DIRECTORY_ENTRY_RESOURCE	Resource Directory – Resources like icons
3	IMAGE_DIRECTORY_ENTRY_EXCEPTION	Custom Exception Handlers
4	IMAGE_DIRECTORY_ENTRY_SECURITY	If the file is digitally signed, this is where it is stored
5	IMAGE_DIRECTORY_ENTRY_BASERELOC	Used for relocating the base address, allows the OS to load the executable to any address it selects
6	IMAGE_DIRECTORY_ENTRY_DEBUG	Debug information if left behind by the developers
7	IMAGE_DIRECTORY_ENTRY_COPYRIGHT
8	IMAGE_DIRECTORY_ENTRY_ARCHITECTURE	Reserved, Must be 0
9	IMAGE_DIRECTORY_ENTRY_GLOBALPTR
10	IMAGE_DIRECTORY_ENTRY_TLS	Essentially a list of functions to call before running the main entry point, watch out for that in malware.
11	IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG
12	IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT
13	IMAGE_DIRECTORY_ENTRY_IAT	Will be populated at runtime with the actuall addressed of all the functions from the Import directory
14	IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT
15	IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR	Used in .NET files.

Conclusion

The PE Structure is a huge web of connections, and I am not able to cover this in a single post, but hopefully, this gave you a small taste of the format. You don’t need to know all of this by heart, but you need to know it exists.

A vast amount of malware uses tricks like the TLS functions that run prior the entry point, or add a custom exception handler that runs malicious code.

And knowing this structure exists and what i can hold helps deal with theese tricks.
I highly suggest you dive deeper into the structure, try to create a parser that takes a PE file and analyzes it, or try to play with the Import table. I remember one of the first things I’ve done is to take a simple program with a while loop that prints and then sleeps for 5 seconds. Playing with the import table and the IAT table, I was able to disable the sleep and give the program some coffee 🙂

0xdavid