The University of Queensland Homepage
School of ITEE ITEE Main Website

 Using the New Jersey Machine Code Toolkit (ML version)
The Icon version of the toolkit has been superceeded, for the purpose of translating matcher files to C, by the ML version. See below for the latest make instructions (the first make section is for reference purposes only). See Tricks for some usage tips.

Be aware that the matcher file needs to have dots on the left hand side , and not underscores, e.g.

  | RCRB.Eb.CL(the_Eaddr) => 
return (*create->RCRB_Eb_CL)(disassemble_Eaddr(the_Eaddr, create));
instead of
  | RCRB_Eb_CL(the_Eaddr) => 
return (*create->RCRB_Eb_CL)(disassemble_Eaddr(the_Eaddr, create));
If the matcher is generated by the latest version of the toolkit (the one that is soft linked to xtools4), this will automatically happen. If your matcher file is hand generated, it may need some editing.

By far the easiest way to run this is to become binstaff, so students wanting to run the new toolkit should either run their own copy on Linux, or get Cristina or myself to run it for them. [Doug seems to be able to run it without becoming binstaff, though, as long as no actual compilation needs to be done]. The toolkit is so memory hungry at present that you can't run it on a 128M SparcStation 5 if you want to use the Pentium spec (you are be able to use SparcStation 5s for smaller specs, including the 286 spec). So ssh to olympic before becoming binstaff if you need the Pentium spec. I use this alias (all on one long line):

alias toolkitml='/net/olympic/u6/olympic/languages/ml/nj.30/bin/.run-sml
@SMLload=/net/olympic/u6/olympic/tools/NJ/sml/sml-toolkit.sparc-solaris'
This alias and a few others are set up if you are binstaff. Change to the directory where your spec and/or matcher files are handy, and use the toolkitml alias:
cd mydir
toolkitml
You should get this response, after a few seconds:

val it = true : bool
-
If instead you get something like
/net/olympic/u7/olympic/languages/ml/nj.31/bin/.run-sml:
Fatal error -- bad magic number (0x82597) in heap image
this means that you are using the wrong version of sml, and you should change them to point to version 109.30. Note: the toolkit will not compile with version 109.31 of the sml compiler.

This is the recommended and quick way of starting the toolkit. The old way was to load sml-nw.sparc-solaris, then

CM.make();
After lots of activity and about 2 minutes of time, (pay no attention to messages like "unstable" and "unusable"), you would end up with

Constraints are:
Unsolved remnants are:

[introducing new bindings into toplevel environment...]
val it = () : unit
- 
When you eventually want to get out of this (don't do it now!) , type ^D (possibly twice).

This is the prompt that you need. At this point, the only thing I know how to do is to translate a matcher file to a C file:

val name = CC.matcher ["path/specname1", "path/specname2"];
[This will take a minute or so, and will emit a few pages of output]
name "path/matcher";
For a concrete example:
val mike = CC.matcher ["temp/pentium-leave.spec", "temp/penwait.spec"];
mike "temp/dis-leave.m";
In this example, the toolkit will produce the file temp/dis-leave.m.d.

The other clue we have is this:

Main.c_code "sparc.spec";
to generate the encoding procedures. He does warn that there are bugs in the present code (as at 22/8/97), and so he recommends using the icon version of the toolkit for this.

Any other functionality might be guessed by reading the literate sources (tools/NJ/sml/src/*.nw). As of August 1997, the ML toolkit cannot generate matching files (i.e. there is no equivalent of the -symdis switch).

Tricks

The ML version of the toolkit has several capabilities that the Icon version does not, but there are a few limitations. I attempt to summarise these here.

Scattered fields

The toolkit can cope with Hp Pa/risc scattered fields. For example, in a "short indexed" load or store, the immediate value in bit positions 11-15 (HP zero at left numbering) is stored as follows:
Bit position*   11 12 13 14 15
Field value** 3 2 1 0 4=Sign
*These bits are in HP/IBM form, 0 = MSB
**These bits are in normal form, 0 = LSB

These can be specified as follows:

constructors
sdisps_faddr(d!, s, b) : faddr { d@[0:3 ] = im4_11,
d@[4:31] = im1_15! }
is s2_16 = s & b_06 = b & addr_22 = 1 & im4_11
& im1_15
The immediate field from bits 11 through 15 has to be split into two fields, im4_11 and im1_15. Note that only fields or constants seem to work on the right hand side of the equations (e.g. d@[0:3 ] = im4_11; you don't seem to be able to write d@[0:3 ] = im5_11@[1:4]).

Note that even though this specicication has bit 0 is most significant at the top, the bit slices use conventional bit numbering. It's necessary to specify all 32 bits of the integer variable d. Note that the fields have to appear in the pattern of the constructor, e.g. & im4_11. I take this to mean that the im4_11 in the equations (in curly braces) are the same im4_11 declared in the fields statements. If these are left out, the constructor won't parse, or worse, it may parse and not match properly (sometimes, but not always, there will be an error such as im4_11 not defined in the translated .m file).

This technique can be used in typed constructors (as above), or in ordinary, untyped constructors.

Similar semantics

Again in HP PA/Risc, there are families of instructions with similar semantics, but two or more binary patterns that differ slightly. For example, the floating point add instruction has two forms:
0E r1 r2 0 r2 f 3 0 r1 t 0 t
6 5 5 3 1 1 2 1 1 1 1 5

0C r1 r2 0 fmt 3 0 0 t
6 5 5 3 2 2 3 1 5
Note that in the 0E form, there are two fields called t; one is a 5 bit field, and one is a 1 bit field. The 0E form of the instruction is used when using the second set of 32 single precision floating point registers. Suppose you want to number the second set of registers 32-63, so you want 6 bit register numbers for the 0E opcode and 5 bits for the 0C opcode. This is easy for the t field, since in the 0C opcode version, there is a zero where the 6th bit would be. But for the r2 field, there is one of the fmt bits, which will be 1 for the cases ,dbl and ,quad, where the register numbers have to be 0-31. So a different constructor is required to differentiate between these two cases.

These different constructors require different names, and also the difference on opcode bits means that there have to be two sets of pattern names to describe these instructions. To keep duplication to a minimum (in fact, to keep it to quadruplication), this is the best I have found so far:

patterns
flt_c3 is any of
[ fadd fsub fmpy fdiv ],
which is FPOP0C & class_21 = 3 & sub_16 = {0 to 3}
flt_c3.E is any of
[ fadd.E fsub.E fmpy.E fdiv.E ],
which is FPOP0E & class_21 = 3 & sub_16 = {0 to 3}
flt_c3_all is fadd | fsub | fmpy | fdiv |
fadd.E | fsub.E | fmpy.E | fdiv.E

constructors
# Class 3, opcode 0C
flt_c3 ,fmt r1,r2,t is flt_c3 & r_06 = r1 & r_11 = r2 & t_27 = t
& fmt_19 = fmt
# Class 3, opcode 0E
flt_c3.E ,fmt r1,r2,t { r1@[0:4] = r_06, r1@[5] = r1_24, r1@[6:31] = 0,
r2@[0:4] = r_11, r2@[5] = f_19, r2@[6:31] = 0,
t@[0:4] = t_27, t@[5] = t_25, t@[6:31] = 0 }
is flt_c3.E & f_20 = fmt
& r_06 & r1_24 & r_11 & f_19 & t_27 & t_25
In the matching statement for floating point class three operations, the flt_c3_all pattern is used to enumerate all the appropriate constructors. For example, there will be two constructors for fadd; one called fadd and the other called fadd.E. It is necessary to process the name in the disassembler to remove the .E.

Making the sml Toolkit

As of November 1998, Norman has simplified the way the sml toolkit is made; see the instructions at http://www.eecs.harvard.edu/~nr/toolkit/ml.html. The only real decisions you have to make are to choose two directories, (see mltk/install) and to make sure that they are in your path. Note that the first one is the path to the SML compiler; I misread the instructions and gave it the path to the mltk directory.

As of August 2000, I use:

MLBASE=/u9/luna/extra/tools/smlnj
BINDIR=/u1/luna/tools/bin
Note: These have been moved early in 2001. The first is at ~binary/u9.luna.extra/tools/smlnj; the second would now be ~binary/u1.luna.extra/tools/bin. As of June 2001, the toolkit has not been recompiled.

If something goes wrong, you should remove ml-lex+ and sml-nw from $MLBASE/bin.

Once you have made it, cd to the src directory, and run sml-nw. (Make sure this is the one in the compiler's bin directory, not some other version from an alias, or earlier in the path). You should get something like

Standard ML of New Jersey v110.8 [FLINT v1.41], August 5, 1998
[CM+nw+tygen+ebnf+ord+lex+]
Note the "nw+tygen" etc in the last part. Now type
CM.make();
After this has completed, you should see something like
Constraints are:
Unsolved remnants are:

[recovering NW/CM/sparc-unix/ml-lib.sig.bin...GC #1.3.8.9.35.1794: (0 ms)
done]
[recovering NW/CM/sparc-unix/ml-lib.sml.bin... done]
[introducing new bindings into toplevel environment...]
val it = () : unit
-
To save having to do this every time, you can now save the image using:
SMLofNJ.exportML "sml-toolkit";
You can then run this image with sml @SMLload=sml-toolkit or even set up an sml-toolkit script in the usual way.

Memory Usage Explosion

Soon after 1st August 2000, Norman made a change that may cause a memory usage explosion. If this happens to you (e.g. you get a message similar to

/usr/share/smlnj/bin/.run-sml: Error -- unable to map 177995776 bytes, errno =12 ...

), then you need the source from before that change. (As of June 2002, there have been no changes to the tookit since that change). To save hassling Norman, you can download the source code tarball of that version here.

Last updated: 03/06/02: Added memory usage explosion secion