Building an Interpreter
- Please download the homework from here
Overview
For this project, you will write an interpreter for a small programming language similar to JavaScript (not TypeScript). To write an interpreter, you need a parser (provided to you inside of ./include/parser.ts
) to turn the program's concrete syntax (a string of characters) into an abstract syntax tree. You will need to traverse the AST making necessary checks and executing corresponding statements and expressions.
Learning Objectives
- Learn fundamentals of programming language implementation
- How to read the grammar for a concrete syntax
- How to work with and read abstract syntax trees
Student Expectations
Students will be graded on their ability to:
- Correctly implement the functions specified below
- Resolve all linter warnings
- Follow the coding, bad practice and testing guidelines
- Design full-coverage unit-tests for the functions they implemented
- See the testing guidelines on coverage for more details
Getting Started
Concrete Syntax
The following (simplified) grammar describes the concrete syntax of the fragment of JavaScript that you will be working with:
Numbers n ::= ... base-10 numbers
Variables x ::= ... variable name, a sequence of alphabetic letters
Expressions e ::= n numeric constant
| true boolean value true
| false boolean value false
| x variable reference
| e_1 + e_2 addition
| e_1 - e_2 subtraction
| e_1 * e_2 multiplication
| e_1 / e_2 division
| e_1 && e_2 logical AND
| e_1 || e_2 logical OR
| e_1 < e_2 less than
| e_1 > e_2 greater than
| e_1 === e_2 equal to
Statements s ::= let x = e; variable declaration
| x = e; assignment
| if (e) b_1 else b_2 conditional
| while (e) b loop
| print(e); display to console
Blocks b ::= { s_1 ... s_n }
Programs p ::= s_1 ... s_n
Some nonterminals (like Numbers and Variables) have been described in words for simplicity. The actual grammar is defined inside of ./include/grammar.pegjs
. You may take a look at it if you are curious, but you should not need to.
Each line of the grammar defines a rule. As an example, the rule
Expressions e ::= n numeric constant
| true boolean value true
| false boolean value false
| e_1 + e_2 addition
| e_1 && e_2 logical AND
would read as: An expression, labeled as e
, may be one of:
n
, a number (as defined above)true
, the boolean value truefalse
, the boolean value false- An expression, , a plus symbol, followed by another expression , for addition
- An expression, , two ampersands (
&&
), followed by another expression , for logical AND
Parser
We have provided two parsing functions, defined in ./include/parser.ts
, the function parseExpression
parses an expression and the function parseProgram
parses a program (a series of statements). Their type signatures are outlined below:
type BinaryOperator = "+" | "-" | "*" | "/" | "&&" | "||" | ">" | "<" | "===";
type Expression =
| { kind: "boolean"; value: boolean }
| { kind: "number"; value: number }
| { kind: "variable"; name: string }
| { kind: "operator"; operator: BinaryOperator; left: Expression; right: Expression };
type Statement =
| { kind: "let"; name: string; expression: Expression }
| { kind: "assignment"; name: string; expression: Expression }
| { kind: "if"; test: Expression; truePart: Statement[]; falsePart: Statement[] }
| { kind: "while"; test: Expression; body: Statement[] }
| { kind: "print"; expression: Expression };
function parseExpression(expression: string): Expression;
function parseProgram(statements: string): Statement[];
On success, these functions will return an object that contains the the corresponding abstract syntax tree (AST) for the given string. On failure, these functions throw an error with a reason: the string cannot be parsed.
Parsing and interpreting are separate stages of a programs execution. The interpreter depends on the parser to construct a valid AST. If the parser fails, then it is considered an unrecoverable failure and proceeding stages, such as interpreting (or linting/formatting if we were writing those tools), cannot run. You are not expected to cover input that the parser rejects.
State
The State type is defined as follows:
type RuntimeValue = number | boolean;
const PARENT_STATE_KEY = Symbol("[[PARENT]]");
export type State = { [PARENT_STATE_KEY]?: State, [key: string]: RuntimeValue }
This notation indicates that a State
object has a variable number of properties with values of type number
or boolean
(representing values of variables that are in scope), and an optional property with a specific symbol key and a value of type State
(link to the parent scope). We have used a symbol
(a primitive type different from string
) to avoid collision with variable names and enable easy type inference (see below).
A block starts a new inner scope. A variable declared in a block will shadow an outer declaration (any variable use will refer to the inner declaration). On exiting a scope, variables declared there are no longer accessible (since we don't have closures). Thus, they should not be in the global state at the end. The nesting of block scopes corresponds to a stack, which you can implement as a linked list, by adding to your State
object a link to an outer scope. Since the link is just another property, this allows all functions to keep their signatures. To ensure the link name does not clash with a program variable, use a property name that is not an identifier (see given: PARENT_STATE_KEY
). The global state cannot have extra properties, and does not need a link, as the last state on the list.
Accessing Values from State
The State
type maps arbitrary strings to a RuntimeValue
or the symbol PARENT_STATE_KEY
to another State
type.
const variableName = "foo";
const value = state[variableName]; // Compiler infers that `value` is a RuntimeValue (or `undefined`)
const parent = state[PARENT_STATE_KEY]; // Compiler infers that `parent` is a State (or `undefined`)
Behavior
The behavior of our interpreter should be similar to the node
interpreter in "strict mode" (with some exceptions). To test what your interpreter should do in a scenario, you may use the node --use-strict
command in a terminal to open a Read Eval Print Loop (REPL). This interface will allow you to input statements and expressions and will display an error or the evaluated result.
Exceptions:
- Arithmetic and greater/less-than comparison may only happen between numbers
- Logical operations should short-circuit. Evaluated operands must be boolean values
- Division by zero is forbidden
- Additional checks to emulate
ReferenceError
behavior are unneeded- This would require an additional pass prior to interpreting to ensure variables are not used before declaration
- See MDN on Hoisting if you are curious
Error Handling
An interpreter can generally not continue meaningfully after an error (as opposed to compilers). Thus, if you find an error, you should throw an error, using an informative error message (i.e. "Arithmetic may only happen between numbers"). You need to do a number of checks (e.g., correct typing, and missing or duplicate declarations).
Optional: Interpreting Functions
As extra (uncredited) practice, you can implement first-class functions inside of your interpreter. We will extend the grammar to include function expressions, call expressions, return statements, and expression statements (1 + 1;
) - to support both f();
and f(g());
:
e ::=
...
| function (x1 ... xn ) b Function expressions
| x(e1 ... en ) Call expression
s ::=
...
| e; Expression statements
| return e; Return statements
The parser already supports these constructs. You may look at types inside ./include/parser.ts
. Here are some hints for what needs to be updated in ./src/interpreter.ts
:
- Add a new type of
RuntimeValue
to support functions/closures- Define an
interface
with the values you might think you'd need
- Define an
- Give
interpStatement
a return type other thanvoid
Rules:
- A function's body is only evaluated when called
- Functions capture the environment they were created in
- There may not be duplicate parameter names
- Providing more, or fewer, arguments than there are parameters is considered a runtime error
- All functions must explicitly return a value (number, boolean, or another function)
- If a function has not explicitly returned after executing its body it is a runtime error
Programming Tasks
Your task is to implement the following functions inside of ./src/interpreter.ts
. You may do them in any order. Reviewing lecture slides might be helpful.
You may not use eval
or anything similar (new Function(...)
).
The most common approach to these functions is using a switch
statement to match Statement
or Expression
variants with their desired behavior. This is an acceptable approach, but looks awkward syntactically. If you'd like, you can use the ts-pattern library to do structural pattern matching rather than a switch
statement. You will need to add a new import statement at the top of ./src/interpreter.ts
:
import { match, P } from "ts-pattern";
interpExpression
Given a state object and an AST of an expression, interpExpression
evaluates the expression and returns its result. It should throw an error if the expression is invalid. See Behavior and Error Handling.
export function interpExpression(state: State, e: Expression): RuntimeValue {
// TODO
}
interpStatement
Given a state object and an AST of a statement, interpStatement
updates the State
object. It should throw an error if the statement is invalid. See Behavior and Error Handling.
export function interpStatement(state: State, p: Statement): void {
// TODO
}
interpProgram
Given the AST of a program, interpProgram
returns the final state of the program. It should throw an error if any statement or expression is invalid. See Behavior and Error Handling.
export function interpProgram(p: Statement[]): State {
// TODO
}
Testing
Approach
Implement interpExpression
, following the template shown in class. You can use an empty object ({ }
) for the state if you do not have any variables, or you can set the values of variables by hand. For example:
describe("interpExpression", () => {
it("evaluates multiplication with a variable", () => {
const r = interpExpression({ x: 10 }, parseExpression("x * 2"));
expect(r).toBe(20);
});
});
Implement interpStatement
and interpProgram
, following the template shown in class. You should be able to test that assignment updates variables. For example:
describe("interpProgram", () => {
it("handles declarations and reassignment", () => {
const st = interpProgram(parseProgram("let x = 10; x = 20;"));
expect(st).toEqual({ x: 20 });
});
});
Finally, test your interpreter with some simple programs. For example, you should be able to interpret an iterative factorial or Fibonacci sequence computation.
Capturing Errors
You may find yourself in a scenario where you need to write a test that verifies a program throws an error. Here is an example of how you would write a test like that:
function sqrt(n: number): number {
if (n < 0) throw new Error("Input must be positive or zero.");
// Do some iterations of Newton's method
}
test("sqrt fails on invalid input", () => {
// Must pass a function, otherwise the error will not be captured and the test will fail
expect(() => {
sqrt(-1);
}).toThrow();
});
You can read more on the Jest documentation on .toThrow()
.