Processing Data in Tree Structures

In Vex there exist several tree structures, the most prominent are the XML DOM (semantics) and the box model (representation). Both structures consist of a heterogeneous set of node types which serve different purposes. The one thing that all nodes in a tree have in common is the responsibility to form the tree structure itself. Most tasks that are not related to the structure only involve a subset of node types.

The simplest thing that might work

Let’s start with a simple example: you have an arbitrary node and want to get the text it contains. This obviously works only for nodes of type IText because all other nodes do not contain any text. A very simple approach would be the use of instanceof:

public String getNodeText(INode node) {
    if (node instanceof IText) {
        return ((IText) node).getText();
    } else {
        return "";
    }  
}

This approach is using reflection, which has kind of a bad reputation, but in the end, it does what it should do. For this simple use case it is an appropriate solution.

There is one thing you should keep in mind: when using instanceof you need discipline. Since it is a very generic mechanism, it is easy to mix up different concerns in one place. Imagine a class TableCellNode which implements several interfaces INode, ITableCell and ITextContainer. There is nothing that prevents you from adding just another little if node instanceof... to handle things concerning the node structure in the same breath with things related to table cells or text containers. Clearly a “crack in the pane”.

Polymorphism

Going to the other extreme, you can use polymorphism. This means, there is this generic getNodeText method with a contract, that defines its external behavior. Each and every node type then has to implement getNodeText according to the contract. In our example, most nodes will just return the empty String.

Polymorphism is a very strong mechanism that comes with object orientation. It provides a way to diversify behavior by setting up a custom type system. The key is, that all members of this type system share the same responsibilities, but fulfill them in a different way.

In our example this would mean that each node type has a useful textual representation. But this is not true. The responsibility shared among node types is to build the tree structure. Only one type, IText, has the special feature to provide textual content. Code related to the textual content will only be interested in nodes of type IText. Putting the getNodeText method into the base of the node types breaks the SRP and is not the best solution for our problem. But if the required functionality was related to the tree structure (the common responsibility of all nodes), polymorphism would have been a proper way to go.

The Visitor pattern

We have seen so far that you can either implement the functionality inside your types using polymorphism, or outside using some kind of type checking mechanism like instanceof. What you do highly depends if the functionality you want to implement affects all of your node types, or only some of them.

Another, more object oriented approach to do type checking is the well known Visitor pattern. It allows you, to implement type specific behavior in dedicated methods. Ideally you implement a visitor class for one single purpose. So instead of adding more and more methods with different responsibilities to each node types, you get more and more visitor classes. Each one contains only the type specific behavior related to one responsibility.

How would our example look like, using the Visitor pattern? As a first step, we will implement it as simple as possible, then refine this solution step by step:

public String getNodeText(INode node) {
    final String[] nodeText = new String[1];
    nodeText[0] = "";
    node.accept(new INodeVisitor() {
        public void visit(IText text) {
            nodeText[0] = text.getText();
        }
        // ... here follows one empty visit method for each other node type
    });
    return nodeText[0];
}

Wow. Not really simple, isn’t it? Because we are using an anonymous implementation of INodeVisitor, we have to use a common trick to get information out of the anonymous class into the getNodeText method. This adds clutter. And we have a lot of empty method implementations (which I left out), that add even more clutter and obfuscate the original purpose of the method.

We will attack the superfluous empty visit methods first. To get rid of the, we just create an implementation of INodeVisitor that contains only empty implementations of all visit methods:

public class BaseNodeVisitor implements INodeVisitor {
    public void visit(IText text) {}
    // ... here follows one empty visit method for each other node type
}

This class is then used to derive an anonymous subclass in place of the anonymous implementation of INodeVisitor:

public String getNodeText(INode node) {
    final String[] nodeText = new String[1];
    nodeText[0] = "";
    node.accept(new BaseNodeVisitor() {
        @Override
        public void visit(IText text) {
            nodeText[0] = text.getText();
        }
    });
    return nodeText[0];
}

That was easy. But how to get rid of this array? Therefor we have to extend the vanilla visitor implementation a little bit. We need an interface that provides methods which return a result:

public interface INodeVisitorWithResult&lt;T&gt; {
    T visit(IText text);
    // ... here follows one visit method for each other node type
}

Of course to be able to use this, INode needs an accept method which also returns the result of the visitor, exemplary implemented for IText:

public &lt;T&gt; T accept(INodeVisitorWithResult&lt;T&gt; visitor) {
    return visitor.visit(this);
}

Equipped with this and a base implementation we can now remove the tricky array from our solution:

public String getNodeText(INode node) {
    return node.accept(new BaseNodeVisitorWithResult&lt;String&gt;() {
        @Override
        public void visit(IText text) {
            return text.getText();
        }
    });
}

So that’s it. Surely more complicated than instanceof. You need to understand the Visitor pattern, you need to understand anonymous implementation in Java, you need to know about the base implementation of your visitor interface. In exchange you get all the amenities of Java’s static type system. For example, with support of your IDE, you can easily find all places where type specific behavior is implemented. Casting is not necessary, because Java’s dynamic dispatching is taking care of routing your request to the right visit method.

Traversal

Traversal is a very common set of functionality needed when working with tree structures. Depending on the task at hands there are various ways to traverse a tree: breadth-first, depth-first in pre-order, in-order, or post-order, and a lot more. The Visitor pattern allows you to implement a base class that provides the generic traversal functionality you need (e.g. depth-first pre-order). To put it in use, you then just need to subclass it and implement the type specific behaviour by overriding the visit methods.

For a detailed example, see DepthFirstTraversal and how it is used in various locations to collect information from Vex’s box model.

switch..case

The well known alternative to the Visitor pattern is using switch..case in combination with a set of constants representing the possible types. For example the W3C’s DOM supports this kind of type checking:

public String getNodeText(org.w3c.dom.Node node) {
    switch node.getNodeType() {
    case Node.TEXT_NODE:
        return ((Text) node).getWholeText();
    default:
        return "";
    }
}

Here you need casting again, because, except the use of reflection, we are technically very close to instanceof. If you use an enum instead of int to define the type constants, you can make sure that nothing gets mixed up with implementing type specific behaviour for your polymorphic node type.

switch..case vs. Visitor

switch..case is much easier to understand than the Visitor pattern. You don’t need the anonymous implementations and all this small Visitor classes. The type constants replicate the polymorphic type system. This is, on the one hand, duplicate information (in the sense of DRY). On the other hand you get more flexibility. The Visitor pattern works only for the leaf types of type hierarchies. You cannot have a visit method for an intermediate type T and at the same time visit methods for subclasses of T. With type constants, any combination is possible.

Visitor instances are objects while type specific behaviour using switch..case (or instanceof) is usually implemented in form of a method. This means a Visitor instance is easier to combine with other patterns (like Chain of Responsibility or Command) to build behaviour dynamically at runtime. Visitor objects are also able to carry state information, if your algorithm requires this.

Choose wisely

As usual in software development, you should not rely on a golden hammer. Use the right tool instead. Having some alternatives and knowing their pros and cons always makes your life easier.