Nifty - Part 3: Expression Parsing
Last time we looked at syntax analysis, creating an abstract syntax tree for a very small subsection of the Swift language. This week, we’ll take a look at parsing binary expressions, with literals, variables, function calls as well as prefix and infix operators, thrown in for good fun. We’ll also take a brief look at the declaration of a function, and wrap up by implementing control flow parsing for simple if conditions and while loops.
Operator-Precedence Parsing
Previously, we limited ourselves to expressions containing only one element
using the parsePrimary()
method. Clearly, these expressions are somewhat
limiting. We’d like to be able to parse expressions like the following.
let x = 1 + 2 * 3 - 4
Following typical operator precedence rules, the above
expression could be equivalently expressed as the following, performing the
highest precedence operator first (*
), followed by +
and
-
, which have the same precedence level and so are performed using
left associativity.
let x = (1 + (2 * 3)) - 4
However, our left-to-right parsing techniques so far would parse the expression as follows:
let x = (((1 + 2) * 3) - 4)
To resolve this issue, we introduce the concept of operator-precedence parsing.
By giving each operator a precedence value, we can capture the
binding precedence of operators. In the above example, where we encounted an
operator with higher precedence than another (*
binding more tightly than than +
),
we first parse the right hand side operation before combining the parsed expression with
an addition. If this seems unclear right now, don’t worry. Let’s take a look at the
implementation and operator precedence values, and all should become clear!
When encountering a primary, we need to now check for operators succeeding it:
private func parseStatement() -> SwiftAST? {
...
case .Identifier(_):
if let lhs = parsePrimary() {
if tokens.count == 0 {
return lhs
}
switch tokens[0] {
case .InfixOperator("=") where lhs.isAssignable:
return parseAssignment(lhs)
default:
return parseOperationRHS(precedence: 0, lhs: lhs)
}
} else {
return nil
}
...
}
Here, if the left hand side is assignable and we have encountered an =
,
we go on to parse the assignment (for more information, see
parseAssignment
). Otherwise, we perform parseOperationRHS
.
parseOperationRHS
is a recursive function. It takes the current
precedence of the left hand side, and the left hand side expression as arguments.
The base case of the function is when we have run out of tokens, in which case
we simply return the left hand side.
func parseOperationRHS(#precedence: Int, var lhs: SwiftExpr) -> SwiftExpr? {
// If there are no more tokens, return the lhs
if tokens.count == 0 {
return lhs
}
...
Suppose that we encounter an infix operator. Firstly, we retrieve the precedence
of that operator, and check it against our current precedence. If that operator
is of lower precedence, we finish parsing and return the left hand side. As
seen in parseStatement
, precedence
starts off at 0.
Consequently, for the first iteration, if an infix token exists it will be parsed.
// Otherwise, switch on the first token
switch tokens[0] {
...
case .InfixOperator(let op):
let tokenPrecedence = getOperatorPrecedence(op)
// If the token we have encountered does not bind as tightly as the current precedence, return the current expression
if tokenPrecedence < precedence {
return lhs
}
...
getOperatorPrecedence
simply looks up the operator in a precedence
dictionary and returns the corresponding value. A sample of the dictionary is
shown below:
var binaryOperatorPrecedence: Dictionary<String, Int> = [
...
"+": 140,
"-": 140,
"&+": 140,
"&-": 140,
"|": 140,
"^": 140,
...
]
By representing the precedence table as a mutable dictionary, we have the potential to add user-defined operators, with corresponding precedence, to the language.
These precedence values may seem magical, but they are actually part of the Swift language specification. Check out the chapter ‘Language Reference: Expressions’ in ‘The Swift Programming Language’ guide for the full operator precedence listings.
Next, we consume the operator and parse the next primary. If we have no further
tokens, we wrap up the lhs
, rhs
and op
into a SwiftBinaryExpression
node.
...
// Get next operand
consumeToken()
// Error handling
if let rhs = parsePrimary() {
// No further tokens
if tokens.count == 0 {
return SwiftBinaryExpression(op: op, lhs: lhs, rhs: rhs)
}
...
Otherwise, we now have two expressions and an operator. We now need to check
if there are further operators in order to assess their precedence. For example, if
we have currently parsed 1 + 2
, we need to check the next operator
to see if it has higher precedence than +
, as in our example. If
we encounter an infix operator, we retrieve its precedence.
...
// Get next operator
switch tokens[0] {
case .InfixOperator(let nextOp):
let nextTokenPrecedence = getOperatorPrecedence(nextOp)
...
If the token had higher precedence, we need to evaluate our current rhs, along with the operator and any futher operators with higher precedence (we add one to ensure we don’t include operators of the same precedence).
...
// The next token has higher precedence, parse from the rhs onwards.
if tokenPrecedence < nextTokenPrecedence {
if let newRhs = parseOperationRHS(precedence: tokenPrecedence, lhs: rhs) {
return parseOperationRHS(precedence: precedence + 1, lhs: SwiftBinaryExpression(op: op, lhs: lhs, rhs: newRhs))
} else {
return nil
}
}
...
Otherwise, we wrap up our expression into a SwiftBinaryExpression
node
and carry on parsing, with this as the new lhs.
...
return parseOperationRHS(precedence: precedence + 1, lhs: SwiftBinaryExpression(op: op, lhs: lhs, rhs: rhs))
...
Using this technique, we can support expressions of arbitrary length and complexity, as well as allowing for future extensibility through user-defined operators. Omitted here, but featured in the code, is the logic for parsing prefix and postfix operators, which simply binds the operator to the expression and continues parsing.
Function Declarations
So far, all of the code we have been parsing has been at the global scope. It’s time to change that with function declaration parsing.
Take a simple function:
func inc(x: Int) -> Int {
return x + 1
}
A function declaration is composed of two parts:
- A prototype. This is what we call the function type signature. It defines the number of arguments to a function, their type, the function’s return type and of course the identifier of the function.
- A body or closure. This is the body of code that, when calling the function, will be executed.
These components will be represented by the SwiftFunctionPrototype
and SwiftFunctionBody
nodes, both of which compose the SwiftFunctionDeclaration
node.
Let’s start off with an extra case in parseStatement
:
private func parseStatement() -> SwiftAST? {
switch tokens[0] {
...
case .Function:
return parseFunctionDeclaration()
}
...
parseFunctionDeclaration
starts off by checking for the existence
of a function token and consuming it.
func parseFunctionDeclaration() -> SwiftFunctionDeclaration? {
let context = self.lineContext[0]
switch tokens[0] {
case .Function:
consumeToken()
default:
errors.append(SCError(message: "Missing expected 'func'.", lineContext: self.lineContext[0]))
return nil
}
...
Next, we simply parse the function’s identifier. Nothing too strenuous.
...
let functionIdentifier: String
switch tokens[0] {
case .Identifier(let string):
functionIdentifier = string
consumeToken()
default:
errors.append(SCError(message: "Missing expected function identifier.", lineContext: self.lineContext[0]))
return nil
}
...
Now onto something more interesting, the parameters of the function. parseFunctionDeclaration
calls parseFunctionParameters
. parseFunctionParameters
does the usual job of checking for a token (a left bracket) and consuming it.
We then use a nested function to actually perform the job of iterating through
the parameter list:
func parseParams() -> [SwiftFunctionParameter]? {
var functionParameters: [SwiftFunctionParameter] = [SwiftFunctionParameter]()
while true {
switch tokens[0] {
case .RightBracket:
return functionParameters
case .Comma:
consumeToken()
default:
if let arg = parseStorageDeclaration(isFunctionParameter: true) as? SwiftFunctionParameter {
functionParameters.append(arg)
} else {
return nil
}
}
}
}
We simply iterate through the list, parsing storage declarations as we would
for any variable or let binding. We are able to reuse the parsing code for a
typical declaration, which includes optional types and assignments. The only
differentiation is that by passing True
for
isFunctionParameter
, a SwiftFunctionParameter
node is
returned instead of the more generic SwiftDeclarationNode
.
parseFunctionParameters
then calls this nested function, tidies
up by consuming the right bracket and returns.
Back to parseFunctionDeclaration
, we now check for a return type.
If no return type is specified, we default to Void
. Otherwise,
we parse the type.
...
let returnType: SwiftType
let returnTypeContext = self.lineContext[0]
switch tokens[0] {
case .Returns:
consumeToken()
if let type = parseType() {
returnType = type
} else {
return nil
}
default:
// If not specified, function type is Void
returnType = SwiftTypeIdentifier(identifier: "Void", lineContext: context)
}
...
All that’s left is to parse the body of the function:
...
var body: SwiftFunctionBody?
if (tokens.isEmpty) {
errors.append(SCError(message: "Function body not specified.", lineContext: returnTypeContext))
return nil
}
let bodyContext = self.lineContext[0]
switch tokens[0] {
case .LeftBrace:
if let closure = parseBody(bracesRequired: true) {
body = SwiftFunctionBody(closure, lineContext: bodyContext)
fallthrough
} else {
return nil
}
default:
return SwiftFunctionDeclaration(id: functionIdentifier, parameters: functionParameters, returnType: returnType, body: body, lineContext: context)
}
}
We make use of the parseBody
function, this time requiring brackets
to enclose the body. parseBody
simply validates the presence of
brackets and parses a number of statements.
Phew, you made it through. As we’ve seen, functions have a number of different components, but using the parsing logic we’ve built so far, parsing is pretty straightforward.
Function Calls
Function declarations aren’t very useful without the ability to call them. We
introduce a SwiftCall
node, containing the identifier of the
function call and the arguments passed to the call, and the corresponding code
to parse and create a SwiftCall
node. Function calls look like
normal identifiers up to the brackets, so we add the following code to
parseIdentifierExpression
:
private func parseIdentifierExpression() -> SwiftExpr? {
...
// Check if it's a function call
if tokens.count == 0 {
return identifier
}
switch tokens[0] {
case .LeftBracket:
consumeToken()
var args = [SwiftExpr]()
while true {
if tokens.count == 0 {
errors.append(SCError(message: "Missing expected ')' in function argument list.", lineContext: self.lineContext[0]))
return nil
}
switch tokens[0] {
case .RightBracket:
consumeToken()
return SwiftCall(identifier: identifier, arguments: args)
case .Comma:
consumeToken()
default:
if let exp = parseExpression() {
args.append(exp)
} else {
return nil
}
}
}
default:
return identifier
}
}
The code simply iterates through the arguments passed to the function, storing
them into an array. When it hits the right bracket, it returns the SwiftCall
node associated with the identifier and the arguments.
Control Flow Parsing
We’ll finish off by taking a look at parsing control flow. For now, we’ll just
cover the simplest of control flow constructs: if
and while
.
Both of these conditional constructs follow the same format, with a boolean expression
for the condition of the loop, and a body. It is the semantics of these two constructs
that differ. As a result, we can parse both of these in the same way. First, we
add the following code to parseStatement
:
private func parseStatement() -> SwiftAST? {
switch tokens[0] {
...
case .While:
fallthrough
case .If:
return parseConditionalStatement()
...
}
First we consume and store the token, corresponding to the if
or
while
.
func parseConditionalStatement() -> SwiftConditionalStatement? {
let context = self.lineContext[0]
var token: SwiftToken
switch tokens[0] {
case .If:
fallthrough
case .While:
token = tokens[0]
consumeToken()
default:
errors.append(SCError(message: "Missing expected 'while'.", lineContext: self.lineContext[0]))
return nil
}
...
All that’s left is to parse the remaining expression corresponding to the conditional, and return the relevant node.
...
if let cond = parseExpression(), let body = parseBody(bracesRequired: true) {
switch token {
case .While:
return SwiftWhileStatement(condition: cond, body: body, lineContext: context)
case .If:
return SwiftIfStatement(condition: cond, body: body, lineContext: context)
default:
return nil
}
}
Here, we’ve covered two basic control flow constructs. Of course, Swift also features
else
blocks, for
loops and switch
cases,
amongst others. Feel free to implement these yourself!
Conclusion
Next time, we’ll take a look at more complex type parsing, allowing us to represent compound types, like function and tuple types.