School of ITEE, The University of Queensland

Doug Simon

Structuring Assembly Programs


Doug's 1997 Honours thesis, under the supervision of Cristina Cifuentes, involved the creation of a new structuring algorithm to structure assembly-based program flow graphs.  This new algorithm was prompted as a way to simplify the expensive data structures and analysis required by the algorithms used in the dcc project.

Doug implemented the algorithms used by Cristina in the dcc project (see Structuring Decompiled Graphs or Chapter 6 of the PhD thesis Reverse Compilation Techniques) as well as the new, simplified, structuring algorithms we devised in 1997.  Both algorithms are compared in a tool called AST (assembly structuring tool) for SPARC assembly programs.

Abstract

An automated assembly to high level language (e.g. C) translator would be of great benefit to maintainers of legacy assembly programs. A component of such a translator would derive higher level constructs such as conditional, loops and sequences from the original source. This can be done by applying an algorithm to structure the control flow graph of the assembly source. This is commonly referred to as a structuring algorithm.

Most existing techniques for control flow structuring not easily adapted to structuring assembly code. During research into the feasibility of a binary decompiler, an algorithm more suited to the task of structuring assembly code was developed. This seminar focuses on a number of modifications to this algorithm that were developed as part of an honour's thesis.  In particular we discuss techniques for generating `better' high level code from unstructured assembly control flow graphs. Unstructured graphs are those that will require the use of gotos in the high level code. The quality of the code can be measured by the number of gotos generated.

In addition to our attempt to improve the functionality of the original algorithm, we also seek to improve the runtime speed of the translator by investigating less expensive data structures. Ideally, any improvement in speed will not involve a trade off with functionality.

We present the results of a prototype assembly structuring tool. The results were generated by running the tool on a number of well known, large benchmark programs used by those in the compiler community. From these results, we discuss what combination of improvements in speed and functionality are optimal in some well defined sense.

Code

The assembly structuring tool, AST, is distributed under a BSD license and it is (C) Doug Simon, 1997.

The code base was last compiled on a SPARC Solaris platform with gcc 3.1 on 1st March 2002. 

Two versions of the AST tool can be built: ast and ast_old.  The ast tool implements the new algorithms, based on dominators and post-dominators of a graph's underlying tree.  The ast_old tool implements the algorithms used in the dcc project, which are based on intervals and the derived sequence of graphs.  The Makefile provided in the distribution compiles the ast tool, to compile the ast_old tool, you need to uncomment some lines in the Makefile.

Code base: ast.tar.gz  (49 KB)

Documentation

The Structuring Assembly Programs thesis describes the algorithms used in the AST code base (postscript, 769 KB). 

Bibtex entry:

@MASTERSTHESIS{
   author = "D. Simon",
   title = "Structuring Assembly Programs",
   school = "The University of Queensland",
   type = "Honours Thesis",
   address = "Department of Computer Science and Electrical Engineering",
   date = nov,
   year = 1997
}

 


Last updated: 24 March 2002

This page: http://www.itee.uq.edu.au/~cristina/students/doug/doug.html