Documentation for JTB 1.0


The Java Tree Builder (JTB) is a tool used to automatically generate syntax trees with the Java Compiler Compiler (JavaCC) parser generator.  This page explains how to use JTB.  If you are unfamilar with the Visitor design pattern please see the section entitled "Why Visitors?".   We also recommend that you see the Tutorial prior to using JTB.

If you are converting to a newer version of JTB, please see the release notes for the new version.

Overview of Generated Files

To begin using JTB, simply run it using your grammar file as an argument (a list of command-line parameters can be obtained by running JTB without any arguments).  This will generate the following: To generate your parser, simply run JavaCC using jtb.out.jj as the grammar file.

Let's take a look at all the files and directories JTB generates.


This file is the same as the input grammar file except that it now contains code for building the syntax tree during parse.  Typically, this file can be left alone after generation.   The only thing that needs to be done to it is to run it through JavaCC to generate your parser.


This directory contains syntax tree node classes generated based on the productions in your JavaCC grammar.  Each production will have its own class.  If your grammar contains 97 productions, this directory will contain 97 classes (plus the special automatically generated nodes--these will be discussed later), with names corresponding to the left-hand side names of the productions.  Like jtb.out.jj, after generation these files don't need to be edited.  Generate them once, compile them once, and forget about them.

Let's examine one of the classes generated from a production.  Take, for example, the following production (from the Java1.1.jj grammar):

This production will generate the file ImportDeclaration.java in the directory (and package) syntaxtree.  This file will look like this:
  Let us now examine this file from the top down.

The first set of comments obviously shows which version of JTB created this file.  The second group of comments is for your benefit, showing the names of the fields of this class (the children of the node), and what parts of the original production they represent.   All parts of a production are represented in the tree, including tokens.

Notice the that this class is in the package "syntaxtree".  The purpose of separating the generated tree node classes into their own package is that it greatly simplifies file organization, particularly when the grammar contains a large number of productions.  If the grammar is stable and not subject to change, once these classes are generated and compiled, it's not necessary to pay them any more attention.  All of the work is to done to the visitor classes.

Next you'll note that this class implements an interface named Node.  This is one of seven tree node classes automatically generated for every grammar.  These classes are as follows:

Automatically-Generated Tree Node Interface and Classes
Node Node interface that all tree nodes implement.
NodeChoice Represents a grammar choice such as ( A | B )
NodeList Represents a list such as ( A )+
NodeListOptional Represents an optional list such as (A )*
NodeOptional Represents an optional such as [ A ] or ( A )?
NodeSequence Represents a nested sequence of nodes
NodeToken Represents a token string such as "package"
These will be discussed in greater detail below.

Next comes the member variables of the ImportDeclaration class.  These are generated based on the RHS of the production.  Their type depends on the various items in the RHS and their names begin with f0 and work their way up.  You may be wondering why these variables are declared as public.  Since the visitors which must access these fields reside in a different package than the syntax tree nodes, package visibility cannot be used.  We decided that breaking encapsulation was a necessary evil in this case.

The next portion of the generated class is the constructor.  It is called from the tree-building actions in the annotated grammar so you will probably not need to use it.

Last is the accept() method.  This method is the way in which visitors interact with the class.


This directory contains the generated Visitor superclass and is where the visitors you write can be placed as well.  The Visitor class contains one method per production in the grammar, plus one method for each of the six automatically-generated classes.  These default methods simply visit each node of the tree, calling the accept() method of each node's children.

Our intent is for the programmer to only have to override those methods for which specific actions must be performed.  For example, in a visitor which simply counts the number of assignment statments in a Java source file, only the overloaded method visit(Assignment n) would need to be modified.

Continuing our above example is the visit(ImportDeclaration n) method of class Visitor:

The comments above each visit method are for the programmer's benefit, showing which field corresponds to which part of the production.  In this example n.f0 is a reference to one of the automatically generated classes, NodeTokenn.f1 refers to a nonterminal of type Namen.f2 refers to a NodeOptional which stores a NodeSequence (more on this later).  n.f3 refers to another NodeToken.

The Automatically Generated Classes

Six classes and an interface are automatically generated for every grammar file.  The six classes are responsible for the various EBNF grammar constructs such as ( )+, ( )*, ( )?, etc.


The interface Node is implemented by all syntax tree nodes. Node looks like this: All tree node classes implement the accept() method.  In the case of all the automatically-generated classes, the accept() method simply calls the corresponding visit(XXXX n) (where XXXX is the name of the production) method of the visitor passed to it.   Note that the visit() methods are overloaded, i.e. the distinguishing feature is the argument each takes, as opposed to its name.


NodeChoice is the class which JTB uses to represent choice points in a grammar.  An example of this would be JTB would represent the production as a class ResultType with a single child of type NodeChoice.  The type stored by this NodeChoice would not be determined until the file was actually parsed.  The node stored by a NodeChoice would then be accessible through the choice field.  Since the choice is of type Node, typecasts are sometimes necessary to access the fields of the node stored in a NodeChoice. Another feature of NodeChoice is the field which.  When determining which of the choices was selected, one option is to use lots of if statements using instanceof.  We found this to be quite messy so we added the which field.  If the first choice is selected, which equals 0 (following the old programming custom to start counting at 0).  If the second choice is taken, which equals 1.  The third choice would be 2, etc.  Be careful as this is potentially problematic since your code could break if you changed the order of one of your choices.


NodeList is the class used by JTB to represent lists.  An example of a list would be JTB would represent the production as a class ArrayDimensions() with children NodeList and NodeListOptional respectively.  NodeLists use java.lang.Vectors to store the lists of nodes.  Like NodeChoice, typecasts may occasionally be necessary to access fields of nodes contained in the list.


This class is very similar to NodeList except that it represents optional lists such as where the list may or may not appear in the input. The only difference between this class and NodeList is the method present(), which returns false if the list is not present, true if it is.


This class is used to store optional constructs such as JTB would represent the production
  as a class ImportDeclaration with children of types NodeToken, Name, NodeOptional, and NodeToken.  This class stores the optional as a Node reference, so once again, typecasts may be necessary to access fields of an optional. Here, present() works the same way as in NodeListOptional.


By now you might be wondering how an automatic class like NodeOptional handles a construct with more than one node in it, like this This is accomplished with a NodeSequence.  This class is used to represent nested lists of nodes.  Using the above construct, JTB would generate a NodeOptional, and inside it would be a NodeSequence with the NodeToken "extends" as the 0th element and Name() as the first.  The interface for NodeSequence is identical to NodeList:


This class is used by JTB to store all tokens into the tree. The tokens are simply stored as strings.  The field tokenImage can be accessed directly, but the toString() will automatically be called if you attempt to print a NodeToken.

JTB Command-Line Options

JTB supports several command-line options:
Option Description
-h Displays a help message including a table with brief descriptions of these options.
-o=NAME Specifies the filename JTB should use to output the annotated grammar rather than use the default jtb.out.jj.
-np=NAME Specifies the directory and package JTB should place the generated syntax tree classes rather than use the default syntaxtree

Note: for nested packages, JTB assumes the current directory is the one directly above the package stated.  For example, if you used "-np=foo.bar.bletch", JTB will assume you are in the directory foo/bar and will generate a directory called bletch to store the node classes.

-vp=NAME Specifies the directory and package JTB should place the generated visitor classes rather than use the default visitor.  The above note for the -np otion applies to this option as well.
-si Reads input from standard input (typically the keyboard) rather than an input grammar file.
-vn Causes JTB to not use overloaded visit() methods.  For a nonterminal NT, the visit method will be visitNT() rather than visit().  This method allows backward compatibility with versions of JTB before 1.0.
In addition, there are several command-line options for the purpose of debugging JTB.  Most users probably will not need to use them, however, should the need arise, the -h option explains these options.

Still have questions?  Suggestions on improving this document?  Feel free to mail Wanjun Wang or Jens Palsberg.


Maintained by Wanjun Wang, wanjun@purdue.edu.
Created January 6, 1999. 
Last modified June 26, 1999.