This idea of a Retargetable Loader (RL) is inviting, but hard to implement.
The Simple Retargetable Loader (SRL) is an attempt to demonstration the benefit
of using a RL to build a binary manipulation tool. It itself is a scaled
downed version of the RL and is implemented in C. The SRL is limited in a
way that its grammar is simple and contains a confined number of constructs.
As described in the previous chapter, the BFF grammar for the SRL was constructed
using three different environments as a basis: (x86, DOS, EXE), (x86, Windows,
NE) and (Sparc, Solaris, ELF). The ELF (on a RISC architecture) being the
most complex BFF of the three , the EXE (CISC) being the simplest and the
NE (CISC) somewhere in between. Nevertheless the SRL's BFF grammar was develop
to be generic in mind.
When implementing a RL, one must consider at what extent does this RL take
part in the decoding of the binary file. Does it decode the whole file and
rewrite it to another representation or does it simple loads the whole file
to memory? How much detail is interpreted by the RL? In the case of SRL,
the primary functions are producing sources to represent structures of the
object file and loading of the image.
6.1 Simple Retargetable Loader (SRL)
Fig. 11 is a description of the contents the SRL produces. The object structures
are the type definitions for various regions of the binary object file for
(M, OS, BFF). The Loading routine contain initialised information for the
object structures and loading of the object image to memory. The object structure
and loading routine are implemented as .H and .C files respectively using
the C language.
The input to the SRL is the object specification: a binary description of
the object file for an environment (M, OS, BFF) written in SRL's syntax grammar.
The specifications for (x86, DOS, EXE), (x86, Windows, NE) and (Sparc, Solaris,
ELF) are used as inputs to the SRL and the set of corresponding .H and .C
interface files are produced. The SRL outputs for an (x86, DOS, EXE)
specification is listed in Fig. 12 and Fig. 13. Surprisingly, the specification
for the Windows NE manage to be larger than the ELF, since the ELF is suppose
to be the more complex of the three. Perhaps if the grammar had more constructs,
then finer details could be captured. In that case, we will see more of the
ELF structure. But is this all part of the loader? Does the loader need to
examine, and be able to identify and understand all the different regions
of the object file? How much disposure should the loader know is still left
as a discussion.
To examine the useability of the SRL outputs, the loading files (.H and .C)
produced from the (x86, DOS, EXE) was integrated with the DCC compiler. The
loading module for the Intel 286 DCC compiler was replaced with the corresponding
SRL outputs. With a few minor changes to the loading files, the DCC was
reconstructed using the loading files. The behaviour for the two versions
of the DCC were the same, hence demonstrating the correctness of the SRL
outputs. The source code for the SRL can be found in appendix 3.
6.2 Interface
The interface routines produced by the SRL are very simple. The .C file merely
provides a loading module for setting up an image in memory (Fig. 13). This
is fine for a DOS EXE format as it is extremely simple, but for other types
of BFFs, one would like to provide interface functions to access different
regions of the binary file. For example, the Windows NE BFF have a number
of tables - imported-name table, segment table, module reference table, etc.
The structure of the segment table in the specification is:
DEFINITION seg_table ADDRESS (sh_segToff + sho_off)
seg_table_ent ARRAY sh_segTent
ste_logSectoff SIZE 16
ste_size SIZE 16
ste_flag SIZE 16
ste_minsize SIZE 16
END seg_table_ent
END seg_table
The SRL creates the structure for the segment table in the .H file and sets
up a pointer which points to the beginning of the table in the image. There
are no routines generated from the SRL for accessing this structure. If the
programmer wants to access a particular entry in the table, then he/she must
directly manipulate this structure by hand code that bit of code. A desirable
feature for an RL would be to automatically generate a set of interface routines,
thus eliminating the need for the programmer to hand code directly manipulate
the structure.
/* This file is generated by the BFF generator using the grammar in "dosexe.txt" */
#ifndef _LOAD_H_
#define _LOAD_H_
#ifdef __MSDOS__
#define INT int
#define LONG unsigned long
#else
#define INT short int
#define LONG unsigned int
#endif __MSDOS__
typedef unsigned char byte;
typedef short int16;
#define LH(p) ((int16)((byte*)(p))[0]+((int16)((byte*)(p))[1]<<8))
typedef struct {
byte h_sigLo;
byte h_sigHi;
INT h_lastPageSize;
INT h_numPages;
INT h_numReloc;
INT h_numParaHeader;
INT h_minAlloc;
INT h_maxAlloc;
INT h_initSS;
INT h_initSP;
INT h_checkSum;
INT h_initIP;
INT h_initCS;
INT h_relcTabOffset;
INT h_overlayNum;
} headerT;
typedef struct {
headerT *header;
byte* section;
char* filename;
LONG imagesize;
byte* image;
} BFF;
extern BFF* aBFF;
LoadImage(char* filename);
#endif _LOAD_H_
Fig. 12 .H file generated by the SRL using the (x86, DOS, EXE) specification
/* This file is generated by the BFF generator using the grammar in "dosexe.txt" */
#include <stdio.h>
#include <string.h>
#include "loader.h"
BFF *aBFF;
LoadImage(char* filename) {
FILE *fp;
LONG imageaddress;
if ((fp=fopen(filename, "rb"))==NULL) {
printf("cannot open file ");
return 0;
}
aBFF = (BFF *)malloc(sizeof(BFF));
aBFF->header = (headerT *)malloc(sizeof(headerT));
if (fread(aBFF->header, sizeof(headerT), 1, fp) != 1) {
printf("cannot read file ");
return 0;
}
aBFF->imagesize = LH(&aBFF->header->h_numPages) * 512 -
LH(&aBFF->header->h_numParaHeader) * 16 - (512 -
LH(&aBFF->header->h_lastPageSize));
aBFF->image = (byte *)malloc(aBFF->imagesize);
fseek(fp, (Int)LH(&aBFF->header->h_numParaHeader) * 16, SEEK_SET);
if (fread(aBFF->image, (size_t)aBFF->imagesize, 1, fp))!=1) {
printf("error reading image ");
return 0;
}
imageaddress = LH(&aBFF->header->h_numParaHeader) * 16;
aBFF->section = aBFF->image + LH(&aBFF->header->h_initIP) + 16 - imageaddress;
aBFF->filename = (char*)malloc(sizeof(char)*(strlen(filename)+1));
strcpy(aBFF->filename, filename);
fclose(fp);
} /* LoadImage */
Fig. 13 .C file generated by the SRL using the (x86, DOS, EXE) specification