Sunday, May 16, 2010

Parsing notes

Top-down vs. bottom-up

Top-down(e.g. ANTLR) is a left most derivation(LL)
Implemented as a recursive-descent algorithm
Finds next node in left sentential form where a non-terminal is replaced with its equivalent RHS.

Bottom-up(such as yacc or parsec) is a right most derivation(LR)
Find a non-terminal in right sentential form such that non-terminal can be replaced with its equivalent LHS.

Complexity of parsing algorithms ususally O(n^3)

LL parsers cannot handle left-recursion. e.g. A->A+B will never stop
left-recursion is a rule where the LHS rule name is also on the RHS.

Pairwsie disjoint test used to evaluate a grammar if it can be LL parsed.
If a rule passes the pairwise disjoint test, then this means no RHS of a rule has a common terminal token.

left factoring(grouping common terminal/non-terminals together toward the left side) can help wiht LL parsers but not in all cases.

LR usually implemented as shift reduce algorithms.
LR advantages:
  1. Works for almost all grammars
  2. Works on more grammars than most other bottom-up algorithms
  3. Syntax errors detected early
  4. LR grammars are a superset of the grammars parsable by LL grammars

No comments: