JTB Documentation

OLD DOCUMENTATION

Documentation for JTB 1.0

Introduction

The Java Tree Builder (JTB) is a tool used to automatically generate syntax trees with the Java Compiler Compiler (JavaCC) parser generator. This page explains how to use JTB. If you are unfamilar with the Visitor design pattern please see the section entitled "Why Visitors?". We also recommend that you see the Tutorial prior to using JTB.

If you are converting to a newer version of JTB, please see the release notes for the new version.

Overview of Generated Files

To begin using JTB, simply run it using your grammar file as an argument (a list of command-line parameters can be obtained by running JTB without any arguments). This will generate the following:

The file jtb.out.jj, the original grammar file, now with syntax tree building actions inserted.
The subdirectory/package syntaxtree which contains a java class for each production in the grammar.
The subdirectory/package visitor which contains Visitor.java, the default visitor superclass.

To generate your parser, simply run JavaCC using jtb.out.jj as the grammar file.

Let's take a look at all the files and directories JTB generates.

`jtb.out.jj`

This file is the same as the input grammar file except that it now contains code for building the syntax tree during parse. Typically, this file can be left alone after generation. The only thing that needs to be done to it is to run it through JavaCC to generate your parser.

`syntaxtree`

This directory contains syntax tree node classes generated based on the productions in your JavaCC grammar. Each production will have its own class. If your grammar contains 97 productions, this directory will contain 97 classes (plus the special automatically generated nodes--these will be discussed later), with names corresponding to the left-hand side names of the productions. Like jtb.out.jj, after generation these files don't need to be edited. Generate them once, compile them once, and forget about them.

Let's examine one of the classes generated from a production. Take, for example, the following production (from the Java1.1.jj grammar):

void ImportDeclaration() :
{}
{
  "import" Name() [ "." "*" ] ";"
}

This production will generate the file ImportDeclaration.java in the directory (and package) syntaxtree. This file will look like this:

//
// Generated by JTB 0.9.0a
//

//
// Grammar production:
// f0 -> "import"
// f1 -> Name()
// f2 -> [ "." "*" ]
// f3 -> ";"
//
package syntaxtree;

public class ImportDeclaration implements Node {
   public NodeToken f0;
   public Name f1;
   public NodeOptional f2;
   public NodeToken f3;

   public ImportDeclaration(NodeToken n0, Name n1,
   NodeOptional n2, NodeToken n3) {
      f0 = n0;
      f1 = n1;
      f2 = n2;
      f3 = n3;
   }

   public void accept(visitor.Visitor v) {
      v.visit(this);
   }
}

Let us now examine this file from the top down.

The first set of comments obviously shows which version of JTB created this file. The second group of comments is for your benefit, showing the names of the fields of this class (the children of the node), and what parts of the original production they represent. All parts of a production are represented in the tree, including tokens.

Notice the that this class is in the package "syntaxtree". The purpose of separating the generated tree node classes into their own package is that it greatly simplifies file organization, particularly when the grammar contains a large number of productions. If the grammar is stable and not subject to change, once these classes are generated and compiled, it's not necessary to pay them any more attention. All of the work is to done to the visitor classes.

Next you'll note that this class implements an interface named Node. This is one of seven tree node classes automatically generated for every grammar. These classes are as follows:

**Automatically-Generated Tree Node Interface and Classes**
`Node`	Node interface that all tree nodes implement.
`NodeChoice`	Represents a grammar choice such as `( A \| B )`
`NodeList`	Represents a list such as `( A )+`
`NodeListOptional`	Represents an optional list such as `(A )*`
`NodeOptional`	Represents an optional such as `[ A ]` or `( A )?`
`NodeSequence`	Represents a nested sequence of nodes
`NodeToken`	Represents a token string such as `"package"`

These will be discussed in greater detail below.

Next comes the member variables of the ImportDeclaration class. These are generated based on the RHS of the production. Their type depends on the various items in the RHS and their names begin with f0 and work their way up. You may be wondering why these variables are declared as public. Since the visitors which must access these fields reside in a different package than the syntax tree nodes, package visibility cannot be used. We decided that breaking encapsulation was a necessary evil in this case.

The next portion of the generated class is the constructor. It is called from the tree-building actions in the annotated grammar so you will probably not need to use it.

Last is the accept() method. This method is the way in which visitors interact with the class.

`visitor`

This directory contains the generated Visitor superclass and is where the visitors you write can be placed as well. The Visitor class contains one method per production in the grammar, plus one method for each of the six automatically-generated classes. These default methods simply visit each node of the tree, calling the accept() method of each node's children.

Our intent is for the programmer to only have to override those methods for which specific actions must be performed. For example, in a visitor which simply counts the number of assignment statments in a Java source file, only the overloaded method visit(Assignment n) would need to be modified.

Continuing our above example is the visit(ImportDeclaration n) method of class Visitor:

//
// f0 -> "import"
// f1 -> Name()
// f2 -> [ "." "*" ]
// f3 -> ";"
//
public void visit(ImportDeclaration n) {
   n.f0.accept(this);
   n.f1.accept(this);
   n.f2.accept(this);
   n.f3.accept(this);
}

The comments above each visit method are for the programmer's benefit, showing which field corresponds to which part of the production. In this example n.f0 is a reference to one of the automatically generated classes, NodeToken. n.f1 refers to a nonterminal of type Name. n.f2 refers to a NodeOptional which stores a NodeSequence (more on this later). n.f3 refers to another NodeToken.

The Automatically Generated Classes

Six classes and an interface are automatically generated for every grammar file. The six classes are responsible for the various EBNF grammar constructs such as ( )+, ( )*, ( )?, etc.

`Node`

The interface Node is implemented by all syntax tree nodes. Node looks like this:

public interface Node {
   public void accept(visitor.Visitor v);
}

All tree node classes implement the accept() method. In the case of all the automatically-generated classes, the accept() method simply calls the corresponding visit(XXXX n) (where XXXX is the name of the production) method of the visitor passed to it. Note that the visit() methods are overloaded, i.e. the distinguishing feature is the argument each takes, as opposed to its name.

`NodeChoice`

NodeChoice is the class which JTB uses to represent choice points in a grammar. An example of this would be

( "abstract" | "final" | "public" )

JTB would represent the production

void ResultType() : {}
{
  "void" | Type()
}

as a class ResultType with a single child of type NodeChoice. The type stored by this NodeChoice would not be determined until the file was actually parsed. The node stored by a NodeChoice would then be accessible through the choice field. Since the choice is of type Node, typecasts are sometimes necessary to access the fields of the node stored in a NodeChoice.

public class NodeChoice implements Node {
   public NodeChoice(Node node, int whichChoice);
   public void accept(visitor.Visitor v);

   public Node choice;
   public int which;
}

Another feature of NodeChoice is the field which. When determining which of the choices was selected, one option is to use lots of if statements using instanceof. We found this to be quite messy so we added the which field. If the first choice is selected, which equals 0 (following the old programming custom to start counting at 0). If the second choice is taken, which equals 1. The third choice would be 2, etc. Be careful as this is potentially problematic since your code could break if you changed the order of one of your choices.

`NodeList`

NodeList is the class used by JTB to represent lists. An example of a list would be

( "[" Expression() "]" )+

JTB would represent the production

void ArrayDimensions() :
{}
{
  ( "[" Expression() "]" )+ ( "[" "]" )*
}

as a class ArrayDimensions() with children NodeList and NodeListOptional respectively. NodeLists use java.lang.Vectors to store the lists of nodes. Like NodeChoice, typecasts may occasionally be necessary to access fields of nodes contained in the list.

public class NodeList implements Node {
   public NodeList();
   public void addNode(Node n);
   public Enumeration elements();
   public Node elementAt(int i);
   public int size();
   public void accept(visitor.Visitor v);

   public Vector nodes;
}

addNode() is used by the tree-building code to add nodes to the list.
elements() is similar to the method of the same name in Vector, returning an Enumeration of the elements in the list.
elementAt() returns the node at the ith position in the list (starting at 0, naturally).
size() returns the number of elements in the list.

`NodeListOptional`

This class is very similar to NodeList except that it represents optional lists such as

( ImportDeclaration() )*

where the list may or may not appear in the input.

public class NodeListOptional implements Node {
   public NodeListOptional();
   public void addNode(Node n);
   public Enumeration elements();
   public Node elementAt(int i);
   public int size();
   public boolean present();
   public void accept(visitor.Visitor v);

   public Vector nodes;
}

The only difference between this class and NodeList is the method present(), which returns false if the list is not present, true if it is.

`NodeOptional`

This class is used to store optional constructs such as

[ "." "*" ]

JTB would represent the production

void ImportDeclaration() : {}
{
  "import" Name() [ "." "*" ] ";"
}

as a class ImportDeclaration with children of types NodeToken, Name, NodeOptional, and NodeToken. This class stores the optional as a Node reference, so once again, typecasts may be necessary to access fields of an optional.

public class NodeOptional implements Node {
   public NodeOptional();
   public void addNode(Node n);
   public void accept(visitor.Visitor v);
   public boolean present();

   public Node node;
}

Here, present() works the same way as in NodeListOptional.

`NodeSequence`

By now you might be wondering how an automatic class like NodeOptional handles a construct with more than one node in it, like this

[ "extends" Name() ]

This is accomplished with a NodeSequence. This class is used to represent nested lists of nodes. Using the above construct, JTB would generate a NodeOptional, and inside it would be a NodeSequence with the NodeToken "extends" as the 0th element and Name() as the first. The interface for NodeSequence is identical to NodeList:

public class NodeSequence implements Node {
   public NodeSequence();
   public void addNode(Node n);
   public Node elementAt(int i);
   public Enumeration elements();
   public int size();
   public void accept(visitor.Visitor v);

   public Vector nodes;
}

`NodeToken`

This class is used by JTB to store all tokens into the tree.

public class NodeToken implements Node {
   public NodeToken(String s);
   public String toString();
   public void accept(visitor.Visitor v);

   public String tokenImage;
}

The tokens are simply stored as strings. The field tokenImage can be accessed directly, but the toString() will automatically be called if you attempt to print a NodeToken.

JTB Command-Line Options

JTB supports several command-line options:

Option	Description
`-h`	Displays a help message including a table with brief descriptions of these options.
`-o=NAME`	Specifies the filename JTB should use to output the annotated grammar rather than use the default `jtb.out.jj`.
`-np=NAME`	Specifies the directory and package JTB should place the generated syntax tree classes rather than use the default `syntaxtree`. *Note:* for nested packages, JTB assumes the current directory is the one directly above the package stated. For example, if you used "`-np=foo.bar.bletch`", JTB will assume you are in the directory `foo/bar` and will generate a directory called `bletch` to store the node classes.
`-vp=NAME`	Specifies the directory and package JTB should place the generated visitor classes rather than use the default `visitor`. The above note for the `-np` otion applies to this option as well.
`-si`	Reads input from standard input (typically the keyboard) rather than an input grammar file.
`-vn`	Causes JTB to not use overloaded `visit()` methods. For a nonterminal NT, the visit method will be `visitNT()` rather than `visit()`. This method allows backward compatibility with versions of JTB before 1.0.

In addition, there are several command-line options for the purpose of debugging JTB. Most users probably will not need to use them, however, should the need arise, the -h option explains these options.

Still have questions? Suggestions on improving this document? Feel free to mail Wanjun Wang or Jens Palsberg.

Back

Maintained by Wanjun Wang, wanjun@purdue.edu.

Created January 6, 1999.
Last modified June 26, 1999.