Nifty - Part 3: Expression Parsing

Last time we looked at syntax analysis, creating an abstract syntax tree for a very small subsection of the Swift language. This week, we’ll take a look at parsing binary expressions, with literals, variables, function calls as well as prefix and infix operators, thrown in for good fun. We’ll also take a brief look at the declaration of a function, and wrap up by implementing control flow parsing for simple if conditions and while loops.

Operator-Precedence Parsing

Previously, we limited ourselves to expressions containing only one element using the parsePrimary() method. Clearly, these expressions are somewhat limiting. We’d like to be able to parse expressions like the following.

let x = 1 + 2 * 3 - 4

Following typical operator precedence rules, the above expression could be equivalently expressed as the following, performing the highest precedence operator first (*), followed by + and -, which have the same precedence level and so are performed using left associativity.

let x = (1 + (2 * 3)) - 4

However, our left-to-right parsing techniques so far would parse the expression as follows:

let x = (((1 + 2) * 3) - 4)

To resolve this issue, we introduce the concept of operator-precedence parsing. By giving each operator a precedence value, we can capture the binding precedence of operators. In the above example, where we encounted an operator with higher precedence than another (* binding more tightly than than +), we first parse the right hand side operation before combining the parsed expression with an addition. If this seems unclear right now, don’t worry. Let’s take a look at the implementation and operator precedence values, and all should become clear!

When encountering a primary, we need to now check for operators succeeding it:

private func parseStatement() -> SwiftAST? {
...
  case .Identifier(_):
    if let lhs = parsePrimary() {
        if tokens.count == 0 {
            return lhs
        }
        switch tokens[0] {
        case .InfixOperator("=") where lhs.isAssignable:
            return parseAssignment(lhs)
        default:
            return parseOperationRHS(precedence: 0, lhs: lhs)
        }
    } else {
        return nil
    }
...
}

Here, if the left hand side is assignable and we have encountered an =, we go on to parse the assignment (for more information, see parseAssignment). Otherwise, we perform parseOperationRHS.

parseOperationRHS is a recursive function. It takes the current precedence of the left hand side, and the left hand side expression as arguments. The base case of the function is when we have run out of tokens, in which case we simply return the left hand side.

func parseOperationRHS(#precedence: Int, var lhs: SwiftExpr) -> SwiftExpr? {
  // If there are no more tokens, return the lhs
  if tokens.count == 0 {
    return lhs
  }
  ...

Suppose that we encounter an infix operator. Firstly, we retrieve the precedence of that operator, and check it against our current precedence. If that operator is of lower precedence, we finish parsing and return the left hand side. As seen in parseStatement, precedence starts off at 0. Consequently, for the first iteration, if an infix token exists it will be parsed.

// Otherwise, switch on the first token
  switch tokens[0] {
  ...
  case .InfixOperator(let op):
      let tokenPrecedence = getOperatorPrecedence(op)
      // If the token we have encountered does not bind as tightly as the current precedence, return the current expression
      if tokenPrecedence < precedence {
        return lhs
      }
  ...

getOperatorPrecedence simply looks up the operator in a precedence dictionary and returns the corresponding value. A sample of the dictionary is shown below:

var binaryOperatorPrecedence: Dictionary<String, Int> = [
...
  "+":    140,
  "-":    140,
  "&+":   140,
  "&-":   140,
  "|":    140,
  "^":    140,
...
]

By representing the precedence table as a mutable dictionary, we have the potential to add user-defined operators, with corresponding precedence, to the language.

These precedence values may seem magical, but they are actually part of the Swift language specification. Check out the chapter ‘Language Reference: Expressions’ in ‘The Swift Programming Language’ guide for the full operator precedence listings.

Next, we consume the operator and parse the next primary. If we have no further tokens, we wrap up the lhs, rhs and op into a SwiftBinaryExpression node.

...
// Get next operand
consumeToken()

// Error handling
if let rhs = parsePrimary() {
  // No further tokens
  if tokens.count == 0 {
    return SwiftBinaryExpression(op: op, lhs: lhs, rhs: rhs)
  }
...

Otherwise, we now have two expressions and an operator. We now need to check if there are further operators in order to assess their precedence. For example, if we have currently parsed 1 + 2, we need to check the next operator to see if it has higher precedence than +, as in our example. If we encounter an infix operator, we retrieve its precedence.

...
  // Get next operator
  switch tokens[0] {
  case .InfixOperator(let nextOp):
    let nextTokenPrecedence = getOperatorPrecedence(nextOp)
...

If the token had higher precedence, we need to evaluate our current rhs, along with the operator and any futher operators with higher precedence (we add one to ensure we don’t include operators of the same precedence).

...
    // The next token has higher precedence, parse from the rhs onwards.
    if tokenPrecedence < nextTokenPrecedence {
      if let newRhs = parseOperationRHS(precedence: tokenPrecedence, lhs: rhs) {
        return parseOperationRHS(precedence: precedence + 1, lhs: SwiftBinaryExpression(op: op, lhs: lhs, rhs: newRhs))
      } else {
        return nil
      }
    }
...

Otherwise, we wrap up our expression into a SwiftBinaryExpression node and carry on parsing, with this as the new lhs.

...
    return parseOperationRHS(precedence: precedence + 1, lhs: SwiftBinaryExpression(op: op, lhs: lhs, rhs: rhs))
...

Using this technique, we can support expressions of arbitrary length and complexity, as well as allowing for future extensibility through user-defined operators. Omitted here, but featured in the code, is the logic for parsing prefix and postfix operators, which simply binds the operator to the expression and continues parsing.

Function Declarations

So far, all of the code we have been parsing has been at the global scope. It’s time to change that with function declaration parsing.

Take a simple function:

func inc(x: Int) -> Int {
  return x + 1
}

A function declaration is composed of two parts:

A prototype. This is what we call the function type signature. It defines the number of arguments to a function, their type, the function’s return type and of course the identifier of the function.
A body or closure. This is the body of code that, when calling the function, will be executed.

These components will be represented by the SwiftFunctionPrototype and SwiftFunctionBody nodes, both of which compose the SwiftFunctionDeclaration node.

Let’s start off with an extra case in parseStatement:

private func parseStatement() -> SwiftAST? {
  switch tokens[0] {
  ...
  case .Function:
      return parseFunctionDeclaration()
  }
  ...

parseFunctionDeclaration starts off by checking for the existence of a function token and consuming it.

func parseFunctionDeclaration() -> SwiftFunctionDeclaration? {
  let context = self.lineContext[0]
  switch tokens[0] {
  case .Function:
    consumeToken()
  default:
    errors.append(SCError(message: "Missing expected 'func'.", lineContext: self.lineContext[0]))
    return nil
  }
  ...

Next, we simply parse the function’s identifier. Nothing too strenuous.

...
  let functionIdentifier: String
  switch tokens[0] {
  case .Identifier(let string):
    functionIdentifier = string
    consumeToken()
  default:
    errors.append(SCError(message: "Missing expected function identifier.", lineContext: self.lineContext[0]))
    return nil
  }
  ...

Now onto something more interesting, the parameters of the function. parseFunctionDeclaration calls parseFunctionParameters. parseFunctionParameters does the usual job of checking for a token (a left bracket) and consuming it. We then use a nested function to actually perform the job of iterating through the parameter list:

func parseParams() -> [SwiftFunctionParameter]? {
  var functionParameters: [SwiftFunctionParameter] = [SwiftFunctionParameter]()
  while true {
    switch tokens[0] {
    case .RightBracket:
      return functionParameters
    case .Comma:
      consumeToken()
    default:
      if let arg = parseStorageDeclaration(isFunctionParameter: true) as? SwiftFunctionParameter {
        functionParameters.append(arg)
      } else {
        return nil
      }
    }
  }
}

We simply iterate through the list, parsing storage declarations as we would for any variable or let binding. We are able to reuse the parsing code for a typical declaration, which includes optional types and assignments. The only differentiation is that by passing True for isFunctionParameter, a SwiftFunctionParameter node is returned instead of the more generic SwiftDeclarationNode.

parseFunctionParameters then calls this nested function, tidies up by consuming the right bracket and returns.

Back to parseFunctionDeclaration, we now check for a return type. If no return type is specified, we default to Void. Otherwise, we parse the type.

...
  let returnType: SwiftType

  let returnTypeContext = self.lineContext[0]
  switch tokens[0] {
  case .Returns:
    consumeToken()
    if let type = parseType() {
      returnType = type
    } else {
      return nil
    }
  default:
    // If not specified, function type is Void
    returnType = SwiftTypeIdentifier(identifier: "Void", lineContext: context)
  }
  ...

All that’s left is to parse the body of the function:

...
  var body: SwiftFunctionBody?
  if (tokens.isEmpty) {
    errors.append(SCError(message: "Function body not specified.", lineContext: returnTypeContext))
    return nil
  }
  let bodyContext = self.lineContext[0]
  switch tokens[0] {
  case .LeftBrace:
    if let closure = parseBody(bracesRequired: true) {
      body = SwiftFunctionBody(closure, lineContext: bodyContext)
      fallthrough
    } else {
      return nil
    }
  default:
    return SwiftFunctionDeclaration(id: functionIdentifier, parameters: functionParameters, returnType: returnType, body: body, lineContext: context)
  }
}

We make use of the parseBody function, this time requiring brackets to enclose the body. parseBody simply validates the presence of brackets and parses a number of statements.

Phew, you made it through. As we’ve seen, functions have a number of different components, but using the parsing logic we’ve built so far, parsing is pretty straightforward.

Function Calls

Function declarations aren’t very useful without the ability to call them. We introduce a SwiftCall node, containing the identifier of the function call and the arguments passed to the call, and the corresponding code to parse and create a SwiftCall node. Function calls look like normal identifiers up to the brackets, so we add the following code to parseIdentifierExpression:

private func parseIdentifierExpression() -> SwiftExpr? {
  ...
  // Check if it's a function call
  if tokens.count == 0 {
      return identifier
  }
  switch tokens[0] {
  case .LeftBracket:
    consumeToken()
    var args = [SwiftExpr]()
    while true {
      if tokens.count == 0 {
        errors.append(SCError(message: "Missing expected ')' in function argument list.", lineContext: self.lineContext[0]))
        return nil
      }
      switch tokens[0] {
      case .RightBracket:
        consumeToken()
        return SwiftCall(identifier: identifier, arguments: args)
      case .Comma:
        consumeToken()
      default:
        if let exp = parseExpression() {
          args.append(exp)
        } else {
          return nil
        }
      }
    }
  default:
    return identifier
  }
}

The code simply iterates through the arguments passed to the function, storing them into an array. When it hits the right bracket, it returns the SwiftCall node associated with the identifier and the arguments.

Control Flow Parsing

We’ll finish off by taking a look at parsing control flow. For now, we’ll just cover the simplest of control flow constructs: if and while. Both of these conditional constructs follow the same format, with a boolean expression for the condition of the loop, and a body. It is the semantics of these two constructs that differ. As a result, we can parse both of these in the same way. First, we add the following code to parseStatement:

private func parseStatement() -> SwiftAST? {
  switch tokens[0] {
  ...
  case .While:
    fallthrough
  case .If:
    return parseConditionalStatement()
  ...
}

First we consume and store the token, corresponding to the if or while.

func parseConditionalStatement() -> SwiftConditionalStatement? {
  let context = self.lineContext[0]
  var token: SwiftToken
  switch tokens[0] {
  case .If:
    fallthrough
  case .While:
    token = tokens[0]
    consumeToken()
  default:
    errors.append(SCError(message: "Missing expected 'while'.", lineContext: self.lineContext[0]))
    return nil
  }
  ...

All that’s left is to parse the remaining expression corresponding to the conditional, and return the relevant node.

...
if let cond = parseExpression(), let body = parseBody(bracesRequired: true) {
  switch token {
  case .While:
    return SwiftWhileStatement(condition: cond, body: body, lineContext: context)
  case .If:
    return SwiftIfStatement(condition: cond, body: body, lineContext: context)
  default:
    return nil
  }
}

Here, we’ve covered two basic control flow constructs. Of course, Swift also features else blocks, for loops and switch cases, amongst others. Feel free to implement these yourself!

Conclusion

Next time, we’ll take a look at more complex type parsing, allowing us to represent compound types, like function and tuple types.