The traditional operating system loader decodes an object file and creates
an image in memory. But such loaders can only understand one type of binary
file format (BFF). It would be ideal to create a generic loader which is
capable of understanding several BFFs. To capture the system environment
information of an object file, we introduced a general notation for describing
attributes of the system: the machine architecture M, the operating system
OS and its binary file format - (M, OS, BFF).
Apart from image creation in operating systems, the loader can also be used
to extract important information in any machine code manipulation tool. Of
all the machine code tools (disassemblers, decompilers, debuggers, binary
translators and tracers/profilers), the binary translator is the most interesting
as it undergoes a complete environment change. The input to the translator
is an object file characterised by
(M1,OS1,BFF1), whereas its output binary
object is a completely different environment
(M2,OS2,BFF2).
Traditionally, we need to write a decoder for every type of object file we
want to manipulate. For example, if we have n different types of object
files or n (M,OS,BFF) tuples, we would need to write n loaders for
each different environments. Unlike traditional loaders which can only understand
one BFF, a retargetable loader (RL) is designed to be generically intelligent,
and understand a wide range of different object formats.
There are essentially three basic approaches to provide loading of a binary
object: you can handcraft the code, use library routines or use specifications.
An RL can be built using library routines or specifications. The idea of
using library routines is simpler but can be difficult, attempts to use tools
such as the BFD library are uninviting due to its complexity. Specifications
are easily understood and trouble free once they have been developed. It
is an ideal method to develop an RL based on a BFF grammar.
A binary object can be seen as having the following file abstractions: an
object file consists of a file header, a number of sections, relocation table
and symbol information. Three BFFs were examined and briefly described: DOS
EXE, Windows NE and Solaris ELF.
There are a few difference between grammars used in programming languages
and the grammar used for BFFs. The most significant difference is the ability
of the BFF grammar to re-reference information that were previously defined.
Previously defined information are critical in object file access. Addresses
and segment sizes are usually controlled by definitions found in the file
header and their values are determined only at run-time. The SRL is a first
attempt to develop an RL with a simple BFF grammar developed by the author.
To demonstrate how the SRL grammar works, specifications for (x86,DOS,EXE),
(x86,Windows,NE) and (Sparc,Solaris,ELF) were created and used as inputs
to the SRL. The SRL outputs a set of object structure (.h file) and loading
routines (.c file) for each of the specifications. The outputs for the
(x86,DOS,EXE) were incorporated into an existing tool (the DCC decompiler)
by replacing its loading modules with the SRL's loading output. The integration
was successful and the program behaved indifferently as before.
The retargetable loader has a lot of potential. Being able to capture different
object structure is a big plus for anyone wanting to write machine code
manipulation tools. Its ability to express BFF structure and provide an almost
automatic way to generate loading information benefits particularly in the
area of binary translation. There are still a lot of problems that are unsolved
in this area. Issues like how much detail is exposed to the loader, how to
create a proficient and flexible RL still needs to be worked on. The most
efficient and ideal BFF grammar that contains the most general structures
and constructs for all system environments has yet to be found.