SU-CS143 APR242025 — Jemoka Knowledge Base

Bottom-Up Parsing Bottom-Up Parsing is more general than top-down parsing (i.e., it doesn’t require left factorization, etc.) This is generally the preferred method. Key insight: we reduce strings into increasingly higher level symbols Sketch Because of the Properties of Bottom-Up Parsing, we can…. initialization split the string into two pieces right substring has terminals, not yet examined left substring has terminals and non-terminals We begin by splitting to the left since we begin with no non-terminals. x1 x2 x3 …. xn moves Shift / Reduce shift: move | one place to the right — shifts a terminal to the left string reduce: apply an inverse production at the right end of the left string (i.e., take a substring closest to the bar, and reduce it) Think: left string is a stack. top: is the |; push: the “shift” operation to push a terminal onto the stack; reduce: pop something onto the stack. conflict When Shift/Reduce are both legal, that’s a “shift reduce” conflict; if multiple reductions are possible, that’s a “reduce reduce” conflict. deciding moves handle We want to reduce only if the result can still be reduced to the starting symbol. A handle is a single production in a legal sequence coming from S which can be reduced into a single nonterminal. Consider:

\begin{equation} S \to^{*} \alpha X \omega \to \alpha \beta \omega \end{equation}

then X \to \beta in the position after \alpha is a handle of \alpha \beta \omega. We reduce at handles. The handles are always top of the stack, otherwise it’d not be a rightmost derivation in reverse. SLR Simple Left-to-right Rightmost derivation is a class of CFGs for which we have simple heuristics for detecting handles. heuristics item An item is a production with “.” somewhere on the RHS, denoting a focus point. For instance, T \to .(E). The only item for X \to \epsilon is X → . We also call these LR(0) items. viable prefix \alpha is a “viable prefix” if there exists \omega such that \alpha | \omega is a state of a shift-reduce parser. a viable prefix doesn’t extend beyond the right end of handle its a variable prefix because its a possible prefix of a handle as long as a parser has viable prefixes on the stack, no parsing error detecting viable prefixs We us an NFA! add a new start production S’ \to S to the grammar the NFA states are items of the grammar For item E \to \alpha . X \beta add the transition [E \to \alpha . X \beta] \to^{x} [E \to \alpha X . \beta] For item E \to \alpha . X \beta and production X \to \gamma, add the transition [E \to \alpha . X \beta] \to^{\epsilon} [X \to . \gamma] every state is an accepting state, start state is S’ \to .S LR(0) Parsing Suppose stack contains \alpha, next input is t. DFA on input \alpha terminated it state s. Reduce by X \to \beta if s contains item X → β . Shift if s contains item X \to \beta .t\omega (that is, there’s a transition labeled t We will get reduce/reduce conflicts if one state contains X → β . and Y → ω .; we will get shift/reduce conflict if X → β. and Y \to \omega .t \gamma LR(0) to SLR Exactly the same process, but! Suppose stack contains \alpha, next input is t. DFA on input \alpha terminated it state s. Reduce by X \to \beta if s contains item X → β.; and if t \in \text{Follow}\left(X\right) Shift if s contains item X \to \beta .t\omega (that is, there’s a transition labeled t If there are conflicts under these rules, then you don’t have an SLR grammar. Remember to keep (symbol, DFA state) on the stack. Properties of Bottom-Up Parsing bottom-up parser traces a rightmost derivation in reverse in shift-reduce strategy, handles appear only at the top of the stack, never inside the stack for any SLR grammar, the set of viable prefixes is a regular language! Let \alpha \beta \omega be a step of bottom up parse, assume the next reduction is X \to \beta, then \omega should already have been a string of terminals (since we go rightmost first) Example int * int + int | T -> int int * T + int | T -> int * T T + int | T -> int T + T | E -> T T + E | E -> T + E E Weird! It somehow starts in the middle. Generally, the parsing strategy expands non-terminals bottom-up in the rightmost first.