Tools for manipulating machine code have been around but are not used
intensively. The area of binary translation, disassembling and decompiling
are often consider to be a difficult areas since low level machine details
are often exposed to the programs. The uses of such tools are inviting for
programs that came without source. The actions to decode an object binary
into a higher level of abstraction is consider as a reverse-engineering process.
2.1 Binary Translators
There have been a number of binary translators being develop in the past
few year for the purpose of migrating software. Most binary translator were
successfully developed by some of the big players like IBM, Sun, Digital,
HP, AT&T and Tandem. Perhaps the most well known binary translators are
Digital's VEST and mx. They translate VAX and MIPS machine instructions to
64bit Alpha instructions. Both of these translators (and all static binary
translators) have a run-time environment which reproduces the old machine's
operating environments. The run-time environment offer a fall-back interpreter
for processing old machine codes that are not yet discovered at translation
time (Eg. indirection jumps). Other static translators like Apple's MAE [4],
Wabi by Sun [3], Digital's Freeport Express [2], and others all uses an
interpreter as a fall-back mechanism for processing old code.
Work in developing dynamic translators is being address in [14]. Dynamic
translators do not need a run-time environment as a mechanic for processing
old system code, instead all translation is done "on the fly". A dynamic
translator can dynamically invoke different parts of the translator when
it needs to decode a new instruction segment. The translated code can be
optimised to produce better performance by using dynamic compiler techniques.
Dynamic compilers like SELF [15] and 'C [6] can provide optimised performance
of 1.5-3 times gain over statically compiled programs.
2.2 Disassembling
The inverse of assembling, ie. an executable of a binary source is decoded
to produce its corresponding assembly representation. The most difficult
area in disassembling is now to differentiate between code and data. An example
is the DOS EXE file format which contain a single meaningful segment describing
code, data and symbol information. Some of the more well know disassemblers
are:
| Name of disassembler | Description |
| Sourcer by V Communications, Inc | One of the best commercial disassembler. Auto detects code and data. |
| IDA PRO 3.53 | Support wide range of file types and machines. Can auto detect code and data, plus auto commenting abilities. |
| Symbolic Visual Disassembler | For Windows95 and WindowsNT 32bit Portables Executable (PE) |
| WDASM | Windows disassembler. |
| TPU2ASM | Turbo Pascal unit disassembler. |
| ID | COM files disassembler. |
| Bubble | COM and EXE disassembler. |
2.3 Other Machine code manipulation tools
The NJMC (New Jersey Machine Code toolkit) [17] helps programmers write programs
that process machine code instructions: assemblers, disassemblers, tracers,
profilers and debuggers. The toolkit allows encoding and decoding of machine
code symbolically. Using a simple machine instruction specification, the
NJMC generates code for machine code manipulation. The specification language
used in the toolkit consists of four elements: fields, tokens, patterns and
constructors for describing machine architectures. The toolkit has been used
to develop a retargetable linker [20] and a retargetable debugger [21].
The EEL (Executable Edition Library) library [18] is a new C++ library which provides machine independent abstractions for analysing and modifying executable programs. Tools built on EEL can analyse and modify executables without being concerned with the underlaying machine instruction set or the object file format. The EEL library uses the NJMC machine specs to provide machine code instruction information and the BFD library for object file access. The internal representation of EEL uses the register-transfer level (RTL) [12] instruction description to capture the semantic of a machine instruction.