2 Related work

Tools for manipulating machine code have been around but are not used intensively. The area of binary translation, disassembling and decompiling are often consider to be a difficult areas since low level machine details are often exposed to the programs. The uses of such tools are inviting for programs that came without source. The actions to decode an object binary into a higher level of abstraction is consider as a reverse-engineering process.


2.1 Binary Translators

There have been a number of binary translators being develop in the past few year for the purpose of migrating software. Most binary translator were successfully developed by some of the big players like IBM, Sun, Digital, HP, AT&T and Tandem. Perhaps the most well known binary translators are Digital's VEST and mx. They translate VAX and MIPS machine instructions to 64bit Alpha instructions. Both of these translators (and all static binary translators) have a run-time environment which reproduces the old machine's operating environments. The run-time environment offer a fall-back interpreter for processing old machine codes that are not yet discovered at translation time (Eg. indirection jumps). Other static translators like Apple's MAE [4], Wabi by Sun [3], Digital's Freeport Express [2], and others all uses an interpreter as a fall-back mechanism for processing old code.

Work in developing dynamic translators is being address in [14]. Dynamic translators do not need a run-time environment as a mechanic for processing old system code, instead all translation is done "on the fly". A dynamic translator can dynamically invoke different parts of the translator when it needs to decode a new instruction segment. The translated code can be optimised to produce better performance by using dynamic compiler techniques. Dynamic compilers like SELF [15] and 'C [6] can provide optimised performance of 1.5-3 times gain over statically compiled programs.



2.2 Disassembling

The inverse of assembling, ie. an executable of a binary source is decoded to produce its corresponding assembly representation. The most difficult area in disassembling is now to differentiate between code and data. An example is the DOS EXE file format which contain a single meaningful segment describing code, data and symbol information. Some of the more well know disassemblers are:

Name of disassembler Description
Sourcer by V Communications, Inc One of the best commercial disassembler. Auto detects code and data.
IDA PRO 3.53 Support wide range of file types and machines. Can auto detect code and data, plus auto commenting abilities.
Symbolic Visual Disassembler For Windows95 and WindowsNT 32bit Portables Executable (PE)
WDASM Windows disassembler.
TPU2ASM Turbo Pascal unit disassembler.
ID COM files disassembler.
Bubble COM and EXE disassembler.



2.3 Other Machine code manipulation tools

The NJMC (New Jersey Machine Code toolkit) [17] helps programmers write programs that process machine code instructions: assemblers, disassemblers, tracers, profilers and debuggers. The toolkit allows encoding and decoding of machine code symbolically. Using a simple machine instruction specification, the NJMC generates code for machine code manipulation. The specification language used in the toolkit consists of four elements: fields, tokens, patterns and constructors for describing machine architectures. The toolkit has been used to develop a retargetable linker [20] and a retargetable debugger [21].

The EEL (Executable Edition Library) library [18] is a new C++ library which provides machine independent abstractions for analysing and modifying executable programs. Tools built on EEL can analyse and modify executables without being concerned with the underlaying machine instruction set or the object file format. The EEL library uses the NJMC machine specs to provide machine code instruction information and the BFD library for object file access. The internal representation of EEL uses the register-transfer level (RTL) [12] instruction description to capture the semantic of a machine instruction.