7 Summary and conclusions

The traditional operating system loader decodes an object file and creates an image in memory. But such loaders can only understand one type of binary file format (BFF). It would be ideal to create a generic loader which is capable of understanding several BFFs. To capture the system environment information of an object file, we introduced a general notation for describing attributes of the system: the machine architecture M, the operating system OS and its binary file format - (M, OS, BFF).

Apart from image creation in operating systems, the loader can also be used to extract important information in any machine code manipulation tool. Of all the machine code tools (disassemblers, decompilers, debuggers, binary translators and tracers/profilers), the binary translator is the most interesting as it undergoes a complete environment change. The input to the translator is an object file characterised by (M1,OS1,BFF1), whereas its output binary object is a completely different environment (M2,OS2,BFF2).

Traditionally, we need to write a decoder for every type of object file we want to manipulate. For example, if we have n different types of object files or n (M,OS,BFF) tuples, we would need to write n loaders for each different environments. Unlike traditional loaders which can only understand one BFF, a retargetable loader (RL) is designed to be generically intelligent, and understand a wide range of different object formats.

There are essentially three basic approaches to provide loading of a binary object: you can handcraft the code, use library routines or use specifications. An RL can be built using library routines or specifications. The idea of using library routines is simpler but can be difficult, attempts to use tools such as the BFD library are uninviting due to its complexity. Specifications are easily understood and trouble free once they have been developed. It is an ideal method to develop an RL based on a BFF grammar.

A binary object can be seen as having the following file abstractions: an object file consists of a file header, a number of sections, relocation table and symbol information. Three BFFs were examined and briefly described: DOS EXE, Windows NE and Solaris ELF.

There are a few difference between grammars used in programming languages and the grammar used for BFFs. The most significant difference is the ability of the BFF grammar to re-reference information that were previously defined. Previously defined information are critical in object file access. Addresses and segment sizes are usually controlled by definitions found in the file header and their values are determined only at run-time. The SRL is a first attempt to develop an RL with a simple BFF grammar developed by the author.

To demonstrate how the SRL grammar works, specifications for (x86,DOS,EXE), (x86,Windows,NE) and (Sparc,Solaris,ELF) were created and used as inputs to the SRL. The SRL outputs a set of object structure (.h file) and loading routines (.c file) for each of the specifications. The outputs for the (x86,DOS,EXE) were incorporated into an existing tool (the DCC decompiler) by replacing its loading modules with the SRL's loading output. The integration was successful and the program behaved indifferently as before.

The retargetable loader has a lot of potential. Being able to capture different object structure is a big plus for anyone wanting to write machine code manipulation tools. Its ability to express BFF structure and provide an almost automatic way to generate loading information benefits particularly in the area of binary translation. There are still a lot of problems that are unsolved in this area. Issues like how much detail is exposed to the loader, how to create a proficient and flexible RL still needs to be worked on. The most efficient and ideal BFF grammar that contains the most general structures and constructs for all system environments has yet to be found.