### CS52 - Spring 2017 - Class 12

#### Lecture notes

- Midterm 2 next Thursday
- Mentor sessions tonight and Wednesday will have specific breakouts for reviewing
- Assignment 5 out
- Can work in pairs

• Assignment 5 high-level: write a compiler for mathematical expressions in SML
- characters -> tokens -> parse tree -> CS52 machine assembly instructions
- a model of how a normal compiler might work:
- takes a string of characters as input
- does an initial pass to break those characters into tokens, i.e. units
- from these tokens, creates an internal representation of the string, called the "abstract syntax tree"
- e.g. https://en.wikipedia.org/wiki/Abstract_syntax_tree
- From the abstract syntax tree, we now can generate the assembly code corresponding to the original input
- three key stages:
1) scan: characters to tokens
2) parse: tokens to syntax tree (internal representation)
3) encode: syntax tree to CS52 assembly

• For now, let's focus on these first two steps

• Example mathematical expressions:
- 2 + 2; (we're going to terminate all of our expressions with a semicolon, like in SML)
- 2 * 3 - 4;
- 2 * (3 - 4);
- 2 + ~(3 - 4) % 2;
- 1 - 2 - 3 - 4;

• Parse structure
- For each of these mathematical expressions there is a unique parse tree
- 2 + 2;

Plus
/    \
2     2

- 2 * 3 - 4

Minus
/       \
Times   4
/  \
2    3

- 2 * (3 - 4)

Times
/    \
2    Minus
/    \
3     4

- 2 + ~(3 - 4)

Times
/    \
2     Negate
|
Minus
/   \
3     4

- 1 - 2 - 3 - 4

Minus
/         \
Minus   4
/         \
Minus   3
/     \
1     2

*** NOT ***
Minus
/     \
1   Minus
/     \
2  Minus
/    \
3       4

• To help us generate these trees, we're going to utilize a grammar

• Look at EBNF for standard arithmetic expressions from assignment 5
- 7 production rules roughly corresponding to different components that you might have
- S: high level expression
- E (expression): any expression consisting of + or -
- T (term): may incorporate *, / or %
- F (factor): may include negation
- P (positive factor): may include parentheses
- N (number)
- D (digit)

- All of the examples above can be generated from this grammar, e.g.
S
E;
T-T;
F*F-F;
P*P-P;
N*N-N;
2*3-4;

• parsing
- write at least one function for each production rule

- should translate directly from the EBNF rules

- each of these functions should:
- take a list of tokens as input
- consume as many tokens as needed for that particular component
- return a tuple:
1) the syntax tree from the created component
2) a list of the remaining tokens that were not processed (i.e. were not part of this component)

• Look at the template file for assignment 5
- What is oper?
- What is token?
- What is syntaxTree?

• parsing concretely
- each of these functions should:
- take a list of tokens as input
- token list
- consume as many tokens as needed for that particular component
- return a tuple:
1) the syntax tree from the created component
- synaxTree
2) a list of the remaining tokens that were not processed (i.e. were not part of this component)
- token list

• Look at the term function

• Advice on how to proceed: work bottom up and debug as you go!
1. Write the number function.
2. Write the positive factor function without handling parenthesis. For now, pretend the EBNF clause is just P ::= N.
3. Write the factor function.
4. Copy the term function from the appendix.
5. Write the expression function.
6. Go back and fix the positive factor function to include parentheses.
7. Write parse.

• Where does the token list come from:
- scan: char list -> token list

• How do we now output CS52 assembly code to do these mathematical computations?!?

• stack based model of computation
- the stack based model of computation utilizes a stack to perform mathematical expressions

- For example, for simple left associative operands of the same precedence (like plus and minus) you can do computation by:
- push the first operand
- repeat
- push the next operand
- pop, pop and perform the next operation on the two popped values
- push the result
- pop the final result when no more operators exist

- For example:
1 - 2 - 3 - 4

1
----
Stack

2
1
----
Stack

pop, pop and perform subtraction

-1
----
Stack

3
-1
----
Stack

pop, pop and perform subtraction

-4
----
Stack

4
-4
----
Stack

pop, pop and perform subtraction

-8
----
Stack

• Syntax tree -> stack based model of computation
- In general, we need to be a bit more sophisticated about how we perform the pushes and pops, though, not by much
- The recursive nature of the syntax tree can help us recursively perform stack based computation:
- if it's a number: push it onto the stack
- it it's not:
- recurse on the left
- recurse on the right
- pop, pop, perform the operation and push the result on the stack
- when all done, pop the answer

- For example: 2 * 3 - 4;

Minus
/       \
Times  4
/    \
2     3

push 2

2
----
Stack

push 3

3
2
----
Stack

pop top two, perform multiplication, and push answer on stack

6
----
Stack

push 4

4
6
----
Stack

pop two two, perform the subtraction, and push answer on the stack

2
----
Stack

- Another example: 2 * (3 - 4);

Times
/       \
2     Minus
/    \
3     4

push 2

2
----
Stack

push 3

3
2
----
Stack

push 4

4
3
2
----
Stack

pop top two, perform the subtraction, and push answer on the stack

-1
2
----
Stack

pop top two, perform the multiplication, and push the answer on the stack

-2
----
Stack

- A final example: 2 + ~(3 - 4);

Times
/ \
2 Negate
|
Minus
/ \
3 4

push 2

2
----
Stack

push 3

3
2
----
Stack

push 4

4
3
2
----
Stack

pop top two, perform the subtraction, and push answer on the stack

-1
2
----
Stack

pop one (it's a unary operator), perform the negation, and push the answer on the stack

1
2
----
Stack

pop top two, perform the multiplication, and push the answer on the stack

2
----
Stack

• encode
- don't overthink it!
- let the recursion and the stack do the work