Published on

Defining ASTs with dynamic classes in Javascript

If you are working in a project that involves parsing you might be using a tool like ANTLR4 to generate a parse tree (PT). A common pattern to decouple the grammar from the business logic is to transform that PT into an AST (Abstract Syntax Seed). Or maybe you are using another kind of parser that doesn't provide a default tree, so you also need to define your own AST.

Regardless of the reason, if you need an AST you might have realized that defining the classes is tedious. So many node classes and each one with their own subtleties. Luckily in a language like Javascript this task can be simplified by using dynamic classes.

Warning: This post is highly subjective since it talks about style. Building the AST is not complex, it's just verbose. Depending on your use case this might not be necessary, this is just a method I found interesting

To keep the post simple I'll show small snippets of code at a time. If you are interested in seeing the complete code, you can see it here.

A glimpse of what we'll build

I'm going to focus on an AST that follows some specific requirements. If those requires imply repetitive or verbose code, the pattern will be more useful. But if they are simple, then this will be overengineering.

I'm going to use an AST where:

  1. Nodes are immutable. To protect the attributes I'll use underscored attributes with getters. I'll ignore the #attr syntax since it doesn't work with dynamic classes.
    • There must be a way of "modifying" a node by cloning it while changing an attribute.
  2. Nodes must implement an equals function.
  3. Every node optionally can have a location

With the method I'll explain you'll be able to define ASTs like this:

const Expr = AST.extend("Expr"); // This is simply to have a "node instanceof Expr"
const Literal = Expr.extend("Literal", ["value", "type"]);
const Variable = Expr.extend("Variable", ["name", "type"]);
const FunctionCall = Expr.extend("FunctionCall", ["name", "args"], {
    attrEquals: function(attr, first, second) {
        if (attr === "args") {
            if (first.length !== second.length) return false;
            return first.every((val, i) => val.equals(second[i]));
        }
        // If you have noticed the "this.super()" instead of just "super()", I'll explain later
        return this.super().attrEquals(attr, first, second);
    }
});

The nodes provide a default equality method, but it doesn't cover arrays so that's why FunctionCall defines an attrEquals method. I'll talk about that later.

With these definitions, you can instantiate nodes in the same order as the attributes in the array:

const ifThenElse = new FunctionCall("ifthenelse", [
    new FunctionCall("equal", [
        new Variable("foo", "integer"),
        new Literal(3, "integer")
    ]),
    new Literal(1, "integer"),
    new Literal(0, "integer")
]); 

Then you could "modify" the nodes using the with() method:

// This is a new object, it doesn't modify the previous one
const locatedIfThenElse = ifthenElse.with().location(new Location(startIndex, endIndex)).build();

const ifTheElseWithDifferentArguments = ifthenElse.with().args([
    new Literal(true, "boolean"),
    new Literal(2, "integer"),
    new Literal(5, "integer")
]).build();

Optional - Is it worth it? The trade-offs

This style is highly subjective and tailored to my specific requirements. Before implementing it consider whether it will be useful or merely add more complexity to your code. Even if you don't find it helpful for your current project, this method of dynamically creating classes is intriguing and has the potential to be applied to other topics beyond ASTs.

It's main advantage is concise and less repetitive code. It also allows to define new node classes quickly, but since you probably won't need to add new nodes frequently it's not a significant advantage.

The main drawback, however, is it adds more complexity. I'm probably not the only one that is a bit tired of all the "magic" involved in frameworks and libraries. It's incredible when it works, and drives me nuts when it doesn't. Even if you read this post and think "oh this is simple", consider what a colleague who hasn't read this post would think.

This approach isn't as complicated as, let's say, Java Spring bean magic. But it may still be overengineering, specially for simple ASTs.

As my teachers used to say: 'Don't overendigencen tje ssttem'

Step 1: The base class

The class AST is the basic building block for every node. It implements the logic for extending the nodes and every one will inherit from it. This extend method is also inherited by the child class, allowing to extend it too. In this way Expr.extends creates a new class that extends both AST and Expr.

class AST {
    static attributes = [];

    static extend(name, attributes=[], methods=[], staticFunctions=[]) {
        /* Code */
    }
}

This extend function returns a new class. Here is how it works: JS allows defining inline classes, plus they allow to inherit from this, which is the class where the function is called. So for AST.extend() this is AST, and for Expr.extend() it's Expr

static extend(name, attributes=[], methods=[], staticFunctions=[]) {
    // Here name doens't refer to the method argument. I'll fix that later
    const myClass = class name extends this {       
    };
    return myClass;
}

Of course this new class doesn't add anything. Now we need to extend it with the different arguments of the function:

myClass.attributes = attributes;

// Assign methods
Object.keys(methods).forEach(name => {
    Object.defineProperty(myClass.prototype, name, {value: methods[name]})
});
// Assign functions
Object.keys(staticFunctions).forEach(name => {
    Object.defineProperty(myClass, name, {value: staticFunctions[name]})
});

In order for methods like equals or the constructor to work I need to keep track of what attributes this class has. I decided to do this by storing them as a static attribute. Another way of handling this could have been defining the class as a string and then using eval to evaluate it, but I prefer this approach.

For each attribute I'll store the actual value in an attribute prefixed with an underscore, while the non-prefixed name will be a getter:

attributes.forEach(attr=> Object.defineProperty(myClass.prototype, attr, 
    {get: function() {return this["_"+attr]}}
));

And to add the final touch, I'll set the name of the class. While not strictly necessary, it can be useful for debugging:

Object.defineProperty (myClass, 'name', {value: name});

The constructor

Now that we've established the ability to extend a class, the next step is to be actually be able to instantiate it. The goal is to use the same order of arguments as specified in the class definition. For example, when you create a Literal class const Literal = Expr.extend("Literal", ["value", "type"]);, and initialize it using new Literal(x, y), it's expected that x corresponds to the value and y to the type.

The best way I found to achieve this is to implement a single constructor in the base class. The downside to this approach is that it doesn't allow to modify the behaviour of a specific constructor. For my use case this isn't necessary, but if it were, there are ways of tackling this problem.

class AST {
    static attributes = [];

    constructor() {
        this.constructor.attributes.forEach((x, i) => this["_" + x] = arguments[i]); 
    }
}

Insuring attribute inheritance

The current code has a problem: A child class must inherit the attributes of its parent class. And yes, we are using traditional inheritance, so this kind of happens, but not fully. The only way to give value to an attribute (without using the underscore "private" attribute) is to use the constructor, and the constructor only takes the attributes explicitly written by the programmer. That's a potential bug lurking in the shadows.

Consider this scenario: Every expression has a type

const Expr = AST.extend("Expr", ["type"]);

But then, you extend it and forget to add it

const Literal = Expr.extend("Literal", ["value"]);

So you wouldn't be able to set the type in the constructor. The best you could do is something like

const lit = Literal(3);
lit._type = "int";

Problems like this might not happen often, since in most scenarios ASTs don't heavily rely on inheritance in the traditional sense. For instance, in the example I made in the introduction Expr doesn't have any attribute because I only need it for an instanceof. The class existance is to avoid doing (node instanceof Literal || node instanceof FunctionCall || node instanceof Variable).

However, just because my specific example doesn't require it doesn't me we should overlook it. To address it I added a check when extending a class:

static extend(name, attributes=[], methods=[], staticFunctions=[]) {
    /* ... */
    //Makes sure that the child at least has all the attributes of the parent class
    this.attributes.forEach(attr => {
        if (!attributes.includes(attr)) throw new Error(`Expected '${name}' to include the property '${attr}' from it's parent '${this.name}'.`)
    });
}

Now the example above would throw an error.

Adding the location

As I said before, a minor requirement for my AST is that each node can optionally include an attribute location. This attribute can be used to find the piece of text that generated the node. Since it's optional and I don't want to be forced to add it to the constructor and propagate it to every single node extension, I decided to define it directly in the class and not inside the attributes array.

class AST {

    static attributes = [];
    _location;


    get location() {
        return this._location;
    }

This is just a small detail but we will need to take into account this location in other places, such as when adding the modifiers and the equals function.

Step 2: Modifiers

In my use case nodes are immutable, which is great for making the code more functional, but it's inconvenient when you actually need to modify it. That's why I wanted a simple way do so. Of course even though I called this "modifying" it actually creates a new node and applies the changes.

This modifications will follow the syntax:

const ifTheElseWithDifferentArguments = ifthenElse.with().args([
    new Literal(true, "boolean"),
    new Literal(2, "integer"),
    new Literal(5, "integer")
]).build();

Each node has an associated modifier class. The with() method returns an instance of this modifier. This class implements a setter method for each attribute, which are used to override the value of the modified node. When all the modifications have been set, the build() method will return the new instance of the node. If an attribute wasn't changed in the modifier, the new node will keep the value of the original.

First, let's define the Modifier class within the base AST. This one is straightforward, containing only the constructor, a location method and a reference to the node we are "modifying"

class AST {

    static Modifier = class {
        constructor(ref) {
            this._location = null;
            this._ref = ref;
        }

        location(loc) {
            this._location = loc;
            return this;
        }
    }
    
    /* ... */
}

Next, we need to extend the modifier. I'll define a static method _extendModifier that will be called in extend:

static extend(name, attributes=[], methods=[], staticFunctions=[]) {
    /* ... */
    this._extendModifier(this, myClass, attributes);
}

The code is quite similar to what we've already seen: Define a dynamic class that extends the parent modifier and define a setter method for each attribute.

static _extendModifier(parentClazz, childClazz, attributes) {
    childClazz.Modifier = class extends parentClazz.Modifier {
    };

    const mod = childClazz.Modifier;

    // Define the setter functions for every attribute of the class
    attributes.forEach(attr => {
        Object.defineProperty(mod.prototype, attr, {value: function(val) {this["_" + attr] = val; return this;}});
    });
}

The with() method of a node simply instances the modifier of its current class:

class AST {
    /* ... */
    with() {
        return new this.constructor.Modifier(this);
    }

The last piece of the puzzle is the build method. I prefered to keep the cloning logic inside AST so it simply calls a method of the node, though I could have implemented the logic directly in build().

static Modifier = class {
    /* ... */

    getAttr(attr) {
        return this["_" + attr] ?? this._ref[attr];
    }

    build() {
        return this._ref._cloneWithModifier(this);
    }
}

The getAttr method is the way of getting the value the cloned node will have. If the modifier has the attribute, use it. If it doesn't, get the atttribute from the original node.

Finally, the implementation of _cloneWithModifier is:

_cloneWithModifier(modifier) {
    const attrs = this.constructor.attributes;
    const cloned = new this.constructor(...attrs.map(x => modifier.getAttr(x)));
    cloned._location = modifier.getAttr("location");
    return cloned;
}

I wanted to call the constructor with the new attribute values just in case at some point I decide to add callbacks to the constructor. In case you wonder what's the other option, I guess you could call the constructor without arguments and then manually override the attributes using the underscored version. Not pretty.

Step 3: Node equality

The final requirement is to implement an equality method that can be configured with options. In this case I'll have a single option: Whether I want to compare locations or not.

equals(other, opts={}) {
    if (!other) return false;
    if (opts?.compareLocations && !this.location?.equals(other.location)) return false;
    if (other.prototype !== this.prototype) return false;
    return this.constructor.attributes.every(attr => this.attrEquals(attr, this[attr], other[attr]));
}

When comparing attributes I added the attrEquals method because I want to avoid boilerplate. The default attrEquals will handle basic cases, including strict equality and subnodes. But if the node has an attribute that needs a special type of equality, it can be added.

attrEquals(attr, first, second) {
    if (first === second) return true;
    if (first == null ||  second == null) return false;
    if (first.prototype !== second.prototype) return false;
    if (first instanceof AST) return first.equals(second);
    return false;
}

The attr parameter is not used in this default implementation. It's designed to be used to override the behaviour of specific attributes. For example the node FunctionCall has a list of arguments and each argument is a node of the tree.

const FunctionCall = Expr.extend("FunctionCall", ["name", "args"], {
    attrEquals: function(attr, first, second) {
        if (attr === "args") {
            if (first.length !== second.length) return false;
            return first.every((val, i) => val.equals(second[i]));
        }
        return this.super().attrEquals(attr, first, second);
    }
});

You might think the this.super() is a mistake, after all to call the parent method you use super().method(). The problem is that doesn't work when defining a method with Object.defineProperty. To overcome it I made a this.super() method that returns a reference to the parent prototype, so it can be use in the same way as super()

super() {
    return Object.getPrototypeOf(Object.getPrototypeOf(this));
}

I'm not gonna lie, this workaround is not pretty, but as I said at the beginning everything has its tradeoffs. If you have any better ideas, I'm interested.

The power of customization

In this post I've focused on using dynamic classes to build an AST that fulfills my specific requirements. But the beauty is that this "pattern" can be customized to suit your needs. Maybe you don't care about immutability but you want to have getX and setX methods. Maybe you want to be able to implement visitors for your AST.

accept(visitor, ctxt) {
    return visitor["visit" + this.constructor.name](ctxt);
}

with just 2 lines of code (let's not count the } 😉) we can already define one

const myVisitor = {
    visitFunctionCall: () => console.log("IT WORKS🎉🎉🎉")
};

ifThenElse.accept(myVisitor);

All I wanted to do in this post is to show that this is an option and that it's valuable. If you also find it useful, feel free to adapt it.


Finally, after 16 posts and two years of sparse blogging I've finally written a post about the thing that gives name to this blog 😎