JTB Documentation

OLD DOCUMENTATION

JTB 1.1 is the current version of JTB. This document describes how to use this version of JTB. If you require information on a previous version, see the Old Documentation page.

Introduction

The Java Tree Builder (JTB) is a tool used to automatically generate syntax trees with the Java Compiler Compiler (JavaCC) parser generator. If you are unfamilar with the Visitor design pattern please see the section entitled "Why Visitors?". We also recommend that you see the Tutorial prior to using JTB.

If you are converting to a newer version of JTB, please see the release notes for the new version.

Overview of Generated Files

To begin using JTB, simply run it using your grammar file as an argument (a list of command-line parameters can be obtained by running JTB without any arguments). This will generate the following:

The file jtb.out.jj, the original grammar file, now with syntax tree building actions inserted.
The subdirectory/package syntaxtree which contains a java class for each production in the grammar.
The subdirectory/package visitor which contains Visitor.java, the default visitor interface, as well as DepthFirstVisitor.java, a default implementation which visits each node of the tree in depth-first order.

To generate your parser, simply run JavaCC using jtb.out.jj as the grammar file.

Let's take a look at all the files and directories JTB generates.

`jtb.out.jj`

This file is the same as the input grammar file except that it now contains code for building the syntax tree during parse. Typically, this file can be left alone after generation. The only thing that needs to be done to it is to run it through JavaCC to generate your parser.

`syntaxtree`

This directory contains syntax tree node classes generated based on the productions in your JavaCC grammar. Each production will have its own class. If your grammar contains 97 productions, this directory will contain 97 classes (plus the special automatically generated nodes--these will be discussed later), with names corresponding to the left-hand side names of the productions. Like jtb.out.jj, after generation these files don't need to be edited. Generate them once, compile them once, and forget about them.

Let's examine one of the classes generated from a production. Take, for example, the following production (from the Java1.1.jj grammar):

void ImportDeclaration() :
{}
{
  "import" Name() [ "." "*" ] ";"
}

This production will generate the file ImportDeclaration.java in the directory (and package) syntaxtree. This file will look like this:

//
// Generated by JTB 1.1.2
//

package syntaxtree;

/**
 * Grammar production:
 * f0 -> "import"
 * f1 -> Name()
 * f2 -> [ "." "*" ]
 * f3 -> ";"
 */
public class ImportDeclaration implements Node {
   public NodeToken f0;
   public Name f1;
   public NodeOptional f2;
   public NodeToken f3;

   public ImportDeclaration(NodeToken n0, Name n1,
   NodeOptional n2, NodeToken n3) {
      f0 = n0;
      f1 = n1;
      f2 = n2;
      f3 = n3;
   }

   public ImportDeclaration(Name n0, NodeOptional n1) {
      f0 = new NodeToken("import");
      f1 = n0;
      f2 = n1;
      f3 = new NodeToken(";");
   }

   public void accept(visitor.Visitor v) {
      v.visit(this);
   }
}

Let us now examine this file from the top down.

The first set of comments obviously shows which version of JTB created this file. The second group of comments is for your benefit, showing the names of the fields of this class (the children of the node), and what parts of the original production they represent. All parts of a production are represented in the tree, including tokens.

Notice the that this class is in the package "syntaxtree". The purpose of separating the generated tree node classes into their own package is that it greatly simplifies file organization, particularly when the grammar contains a large number of productions. If the grammar is stable and not subject to change, once these classes are generated and compiled, it's not necessary to pay them any more attention. All of the work is to done to the visitor classes.

Next you'll note that this class implements an interface named Node. This is one of eight tree node classes and interfaces automatically generated for every grammar. These classes are as follows:

**Automatically-Generated Tree Node Interface and Classes**
`Node`	Node interface that all tree nodes implement.
`NodeListInterface`	List interface that `NodeList`, `NodeListOptional`, and `NodeSeqeunce` implement.
`NodeChoice`	Represents a grammar choice such as `( A \| B )`
`NodeList`	Represents a list such as `( A )+`
`NodeListOptional`	Represents an optional list such as `(A )*`
`NodeOptional`	Represents an optional such as `[ A ]` or `( A )?`
`NodeSequence`	Represents a nested sequence of nodes
`NodeToken`	Represents a token string such as `"package"`

These will be discussed in greater detail below.

Next comes the member variables of the ImportDeclaration class. These are generated based on the RHS of the production. Their type depends on the various items in the RHS and their names begin with f0 and work their way up. You may be wondering why these variables are declared as public. Since the visitors which must access these fields reside in a different package than the syntax tree nodes, package visibility cannot be used. We decided that breaking encapsulation was a necessary evil in this case.

The next portion of the generated class is the standard constructor. It is called from the tree-building actions in the annotated grammar so you will probably not need to use it.

Following the first constructor is a convenience constructor with the constant tokens of the production already filled-in by the appropriate NodeToken. This constructor's purpose is to help in manual construction of syntax trees.

After the constructor is the accept() method. This method is the way in which visitors interact with the class.

`visitor`

This directory contains the generated Visitor interface and DepthFirstVisitor class and is where the visitors you write can be placed as well. The Visitor interface contains one method declaration per production in the grammar, plus one method declaration for each of the six automatically-generated classes. DepthFirstVisitor is a convenience class which implements the Visitor interface. Its default methods simply visit each node of the tree, calling the accept() method of each node's children.

All visitor classes must implement the Visitor interface either directly or by subclassing a class which does so (such as DepthFirstVisitor).

With regards to DepthFirstVisitor, our intent is for the programmer to only have to override those methods for which specific actions must be performed. For example, in a visitor which simply counts the number of assignment statments in a Java source file, only the overloaded method visit(Assignment n) would need to be modified.

Continuing our above example is the visit(ImportDeclaration n) method of class DepthFirstVisitor:

/**
 * f0 -> "import"
 * f1 -> Name()
 * f2 -> [ "." "*" ]
 * f3 -> ";"
 */
public void visit(ImportDeclaration n) {
   n.f0.accept(this);
   n.f1.accept(this);
   n.f2.accept(this);
   n.f3.accept(this);
}

The comments above each visit method are for the programmer's benefit, showing which field corresponds to which part of the production. In this example n.f0 is a reference to one of the automatically generated classes, NodeToken. n.f1 refers to a nonterminal of type Name. n.f2 refers to a NodeOptional which stores a NodeSequence (more on this later). n.f3 refers to another NodeToken.

The Automatically Generated Classes

Six classes and two interfaces are automatically generated for every grammar file. The six classes are responsible for the various EBNF grammar constructs such as ( )+, ( )*, ( )?, etc.

`Node`

The interface Node is implemented by all syntax tree nodes. Node looks like this:

public interface Node extends java.io.Serializable {
   public void accept(visitor.Visitor v);
}

All tree node classes implement the accept() method. In the case of all the automatically-generated classes, the accept() method simply calls the corresponding visit(XXXX n) (where XXXX is the name of the production) method of the visitor passed to it. Note that the visit() methods are overloaded, i.e. the distinguishing feature is the argument each takes, as opposed to its name.

Two new features are present in JTB 1.1. The first is that Node extends java.io.Serializable, meaning that you can now serialize your trees (or subtrees) to an output stream and read them back in. If you are not familiar with object serialization, see the Java documentation on the java.io.Serializable interface.

`NodeListInterface`

The interface NodeListInterface is implemented by NodeList, NodeListOptional, and NodeSequence. NodeListInterface looks like this:

public interface NodeListInterface extends Node {
   public void addNode(Node n);
   public Node elementAt(int i);
   public java.util.Enumeration elements();
   public int size();
}

You probably won't need to worry about this interface. It can be useful, though, when writing code which only deals with the Vector-like functionality of any of the three classes listed above.

addNode() is used by the tree-building code to add nodes to the list.
elements() is similar to the method of the same name in Vector, returning an Enumeration of the elements in the list.
elementAt() returns the node at the ith position in the list (starting at 0, naturally).
size() returns the number of elements in the list.

`NodeChoice`

NodeChoice is the class which JTB uses to represent choice points in a grammar. An example of this would be

( "abstract" | "final" | "public" )

JTB would represent the production

void ResultType() : {}
{
  "void" | Type()
}

as a class ResultType with a single child of type NodeChoice. The type stored by this NodeChoice would not be determined until the file was actually parsed. The node stored by a NodeChoice would then be accessible through the choice field. Since the choice is of type Node, typecasts are sometimes necessary to access the fields of the node stored in a NodeChoice.

public class NodeChoice implements Node {
   public NodeChoice(Node node, int whichChoice);
   public void accept(visitor.Visitor v);

   public Node choice;
   public int which;
}

Another feature of NodeChoice is the field which. When determining which of the choices was selected, one option is to use lots of if statements using instanceof. We found this to be quite messy so we added the which field. If the first choice is selected, which equals 0 (following the old programming custom to start counting at 0). If the second choice is taken, which equals 1. The third choice would be 2, etc. This allows a programmer to use a much cleaner switch statement. Note that your code could potentially break if the order of the choices is changed in the grammar.

`NodeList`

NodeList is the class used by JTB to represent lists. An example of a list would be

( "[" Expression() "]" )+

JTB would represent the production

void ArrayDimensions() :
{}
{
  ( "[" Expression() "]" )+ ( "[" "]" )*
}

as a class ArrayDimensions() with children NodeList and NodeListOptional respectively. NodeLists use java.lang.Vectors to store the lists of nodes. Like NodeChoice, typecasts may occasionally be necessary to access fields of nodes contained in the list.

public class NodeList implements NodeListInterface {
   public NodeList();
   public void addNode(Node n);
   public Enumeration elements();
   public Node elementAt(int i);
   public int size();
   public void accept(visitor.Visitor v);

   public Vector nodes;
}

`NodeListOptional`

This class is very similar to NodeList except that it represents optional lists such as

( ImportDeclaration() )*

where the list may or may not appear in the input.

public class NodeListOptional implements NodeListInterface {
   public NodeListOptional();
   public void addNode(Node n);
   public Enumeration elements();
   public Node elementAt(int i);
   public int size();
   public boolean present();
   public void accept(visitor.Visitor v);

   public Vector nodes;
}

The only difference between this class and NodeList is the method present(), which returns false if the list is not present, true if it is.

`NodeOptional`

This class is used to store optional constructs such as

[ "." "*" ]

JTB would represent the production

void ImportDeclaration() : {}
{
  "import" Name() [ "." "*" ] ";"
}

as a class ImportDeclaration with children of types NodeToken, Name, NodeOptional, and NodeToken. This class stores the optional as a Node reference, so once again, typecasts may be necessary to access fields of an optional.

public class NodeOptional implements Node {
   public NodeOptional();
   public void addNode(Node n);
   public void accept(visitor.Visitor v);
   public boolean present();

   public Node node;
}

Here, present() works the same way as in NodeListOptional.

`NodeSequence`

By now you might be wondering how an automatic class like NodeOptional handles a construct with more than one node in it, like this

[ "extends" Name() ]

This is accomplished with a NodeSequence. This class is used to represent nested lists of nodes. Using the above construct, JTB would generate a NodeOptional, and inside it would be a NodeSequence with the NodeToken "extends" as the 0th element and Name() as the first. The interface for NodeSequence is identical to NodeList:

public class NodeSequence implements NodeListInterface {
   public NodeSequence();
   public void addNode(Node n);
   public Node elementAt(int i);
   public Enumeration elements();
   public int size();
   public void accept(visitor.Visitor v);

   public Vector nodes;
}

As of JTB 1.1, NodeSequence is also the container node for parenthesized expressions. Any time you use parentheses that are not enclosing a choice, or followed by a "*", "+", or "?", for example:

void foo() : {}
{
   A() ( B() C() ) D()
}

The expansion units within the parentheses are placed in a NodeSequence. In this case, JTB will generate a node called foo with its children being of types A, NodeSequence, and D. The NodeSequence will contain two elements of respective types B and C.

Since we are undecided if this type of construct is necessary, and since these extra parentheses caused JTB to behave incorrectly prior to version 1.1, for now these parentheses will be flagged as a warning by the semantic checker. We may remove this warning if we decide there is a use for this construct. Note that you may disable the semantic checker by using the -e command-line option if desired.

`NodeToken`

This class is used by JTB to store all tokens into the tree, including JavaCC "special tokens" (if the -tk command-line option is used). In addition, each NodeToken contains information about each token, including its starting and ending column and line numbers.

public class NodeToken implements Node {
   public NodeToken(String s);
   public NodeToken(String s, int kind, int beginLine,
      int beginColumn, int endLine, int endColumn);
   public String toString();
   public void accept(visitor.Visitor v);

   public String tokenImage;

   // -1 for these ints means no position info is available.
   public int beginLine, beginColumn, endLine, endColumn;

   // Equal to the JavaCC token "kind" integer.
   // -1 if not available.
   public int kind;

   // Special Token methods below
   public NodeToken getSpecialAt(int i);
   public int numSpecials();
   public void addSpecial(NodeToken s);
   public void trimSpecials();
   public String withSpecials();

   public Vector specialTokens;
}

The tokens are simply stored as strings. The field tokenImage can be accessed directly, and the toString() method returns the same string.

Also available is the kind integer. JavaCC assigns each type of token a unique integer to identify it. This integer is now available in each JTB NodeToken. For more information on using the kind integer, see the JavaCC documentation.

If the -tk command-line option is used, JTB will also store special tokens in the tree (see the JavaCC documentation for information on special tokens). Since these tokens have no place in the syntax tree structure, JTB stores the special token in the next regular token which follows it. If multiple special tokens appear before a regular token, that token's NodeToken object will store the special tokens in the specialTokens Vector.

The getSpecialAt(), numSpecials(), and addSpecial() methods are self-explanatory; they function like the similarly-named methods in the Vector class.
trimSpecials() calls the trimToSize() method of Vector in order to conserve memory. It is called by the parser during tree construction.
withSpecials() acts like the toString() method except that it returns a string with all of the special tokens present (in a somewhat messy, unformatted manner--we recommend you override the visit(NodeToken) method to print special tokens as desired).

JTB Command-Line Options

JTB supports several command-line options:

Option	Description
`-h`	Displays a help message including a table with brief descriptions of these options.
`-o NAME`	Specifies the filename JTB should use to output the annotated grammar rather than use the default `jtb.out.jj`.
`-np NAME`	Specifies the directory and package JTB should place the generated syntax tree classes rather than use the default `syntaxtree`. *Note:* for nested packages, JTB assumes the current directory is the one directly above the package stated. For example, if you used "`-np=foo.bar.bletch`", JTB will assume you are in the directory `foo/bar` and will generate a directory called `bletch` to store the node classes.
`-vp NAME`	Specifies the directory and package JTB should place the generated visitor classes rather than use the default `visitor`. The above note for the `-np` otion applies to this option as well.
`-p NAME`	Shorthand for "`-np NAME.syntaxtree -vp NAME.visitor`".
`-si`	Reads input from standard input (typically the keyboard) rather than an input grammar file.
`-w`	JTB will no longer overwrite existing files.
`-e`	Supresses JTB semantic error checking.
`-jd`	Generates JavaDoc-friendly comments in generated visitors and syntax tree classes.
`-f`	Generates descrpitive node class child field names such as `whileStatement` and `nodeToken2` rather than `f0`, `f1`, etc.
`-ns NAME`	Specifies the name of the class (e.g. `mypackage.MyClass`) that all node classes should subclass. This class must be supplied by the user.
`-pp`	Generates parent pointers in all node classes as well as `getParent()` and `setParent()` methods. The parent reference of a given node will automatically be set when the node is passed to the constructor of another node. The root node's parent will be `null`.
`-tk`	Stores special tokens into the parse tree.

Toolkit Options

In addition to the standard command-line options above, there are some "toolkit" command-line options for JTB. These cause JTB to automatically generate several visitors for specific purposes.

Scheme Tree Builder Toolkit

The -scheme option allows one to use JTB as a syntax tree builder for the Scheme programming language. This option does two things:

Generates records.scm to the current directory. This file is analgous to the syntaxtree directory in that it defines the syntax tree. The difference is that this file contains Scheme record definitions as opposed to Java classes.
Generates the visitor SchemeTreeBuilder.java into the visitor directory and package. This visitor will traverse a syntax tree and output the Scheme equivalent of the tree. The default output location is standard output, but this can be modified by passing the constructor a Java Writer object.

Several constraints must be placed on a JavaCC grammar if the Scheme option is to be used:

Choices such as ( x | y ) may only occur at the top level of a production. The choice may only be between two nonterminal symbols. For example, ( x y | z ) would be illegal.

( x )+

( y )*

[ z ]

[ x y ]

Printer Toolkit

The -printer option causes JTB to generate two additional visitors for aid in printing and formatting your syntax trees:

The visitor TreeDumper, when used on a syntax tree, will output (to standard output or another OutputStream or Writer of your choice) the tree based on the token location variables in each NodeToken. If you use this visitor on an unmodified syntax tree, it will print the tree out exactly as it was read in. It contains several methods:

`public void flushWriter()`	Flushes the OutputStream or Writer that TreeDumper is using to output the syntax tree.
`public void printSpecials(boolean b`)	Allows you to specify whether or not to print special tokens.
`public void startAtNextToken()`	Starts the tree dumper on the line containing the next token visited. For example, if the next token begins on line 50 and the dumper is currently on line 1 of the file, it will set its current line to 50 and continue printing from there, as opposed to printing 49 blank lines and then printing the token.
`public void resetPosition()`	Resets the position of the internal "cursor" to the first line and column. For example, if the interal cursor was at line twenty and the next token begins on line twenty one, a single carriage return is output, then the token. If `resetPosition()` is called, the interal cursor will be reset to line 1. Twenty carriage returns would be output, then the token. When using a dumper on a syntax tree more than once, you either need to call this method or `startAtNextToken()` between each dump.

Using these methods, it is possible to only print certain parts of the tree. For example, to only print method signatures in the Java grammar, the following annonymous class could be used:

      root.accept(new DepthFirstVisitor() {
         public void visit(MethodDeclaration n) {
            dumper.startAtNextToken();
            n.f0.accept(dumper);
            n.f1.accept(dumper);
            n.f2.accept(dumper);
            n.f3.accept(dumper);
            // skip n.f4, the method body
            System.out.println();
         }
      });

The TreeFormatter visitor is a skeleton pretty printer template. It contains convenience methods to modify the token location information in each NodeToken:

`public TreeFormatter(int indentAmt,` `int wrapWidth)`	Allows you to specify the number of spaces per indentation level and the number of columns per line, after which tokens are wrapped to the next line (the default constructor assumes an `indentAmt` of 3 and a `wrapWidth` of 0, i.e. no line wrapping).
`protected void add(FormatCommand cmd)`	Use this method to add `FormatCommands` to the command queue to be executed when the next token in the tree is visited.
`protected FormatCommand force(int i)`	A Force command inserts one or more line breaks and indents the next line to the current indentation level. Without an argument, adds just one line break. Use `add(force());`
`protected FormatCommand indent()`	An Indent command increases the indentation level by one or more. Without an argument, just adds one indent level. Use `add(indent());`
`protected FormatCommand outdent()`	An Outdent command is the reverse of the Indent command: it reduces the indentation level. Use `add(outdent());`
`protected FormatCommand space()`	A Space command simply adds one or more spaces between tokens. Without an argument, adds just just one space. Use `add(space());`
`protected void processList(` `NodeListInterface n, FormatCommand cmd)`	Visits each element of a `NodeList`, `NodeListOptional`, or `NodeSequence` and inserts an optional `FormatCommand` between each element (but not after the last one).

For example, this is how the CompilationUnit visit() method looks in the rewritten Java pretty printer example:

   /**
    * f0 -> [ PackageDeclaration() ]
    * f1 -> ( ImportDeclaration() )*
    * f2 -> ( TypeDeclaration() )*
    * f3 -> <EOF>
    */
   public void visit(CompilationUnit n) {
      if ( n.f0.present() ) {
         n.f0.accept(this);
         add(force(2));
      }

      if ( n.f1.present() ) {
         processList(n.f1, force());
         add(force(2));
      }
      if ( n.f2.present() ) {
         processList(n.f2, force(2));
         add(force());
      }
      n.f3.accept(this);
   }

Common Problems

New to JTB 1.1 is a semantic checking phase which looks for code which may be legal for JavaCC but may cause problems for JTB. Below is a description of the errors and warnings that are flagged by the new checker.

Errors

These are problems which will definitely cause JTB to choke. You must fix the problem or work around it in your input file before JTB can proceed.

Message	Description
Production "SomeProduction" has the same name as a JTB-generated class.	A production within the input grammar has a name which is reserved by JTB, such as `Node`, `NodeList`, etc.

Warnings

These are potential errors. Some code could cause JTB problems, but not in all cases. It is left up to your judgment as to whether or not to try to correct the code in question.

Message	Description
Javacode block must be specially handled.	See the Known Issues section of the Release Nodes page.
Non-void return type in SomeProduction().	All productions in a grammar on which JTB is to be used should have a return type of `void`. JTB replaces all return types in the grammar upon processing.
Block of Java code in SomeProduction().	A production contains a block of embedded Java code. While it's possible this may not cause problems, the Java code could interact or interfere with the code JTB inserts into the grammar. A JTB grammar should ideally contain no embedded Java code.
Extra parentheses in SomeProduction().	A production contains extraneous parentheses (i.e. not enclosing a choice or followed by "*", "+", or "?"). This former caused JTB to misbehave but this has been corrected for 1.1 (see the section on `NodeSequence`). However, to be safe, we are still flagging this so you are aware should any pesky lingering bugs still be present.

Still have questions? Suggestions on improving this document? Feel free to mail Wanjun Wang or Jens Palsberg.

Back

Maintained by Wanjun Wang, wanjun@purdue.edu.

Created September 4, 1997.
Last modified June 26, 1999.