The document discusses converting an arbitrary context-free grammar (CFG) into Chomsky normal form, which restricts rules to be either A BC or A w. It explains that new intermediate non-terminals are introduced to distribute rules of length greater than 2 over several binary rules, such that rules like S ABC become S XC and X AB. A 4-step process is outlined for the conversion: 1) copy conforming rules, 2) convert terminals to non-terminals, 3) convert unit productions, and 4) make all rules with non-terminals on the right binary.
2. What if your grammar isnt binary?
As in the case of the TreeBank grammar?
Convert it to binary any arbitrary CFG
can be rewritten into Chomsky-Normal
Form automatically.
The resulting grammar accepts (and rejects)
the same set of strings as the original
grammar.
But the resulting derivations (trees) are
different.
We saw this in the last set of lecture notes
3. More specifically, we want our rules to be of
the form
A B C
Or
A w
That is, rules can expand to either 2 non-terminals
or to a single terminal.
4. Introduce new intermediate non-terminals
into the grammar that distribute rules with
length > 2 over several rules.
So S A B C turns into
S X C and
X A B
Where X is a symbol that doesnt
occur anywhere else in the the
grammar.
4
5. 1. Copy all conforming rules to the new
grammar unchanged
2. Convert terminals within rules to
dummy non-terminals
3. Convert unit productions
4. Make all rules with NTs on the right
binary
In lecture: what these mean; apply to
example on next two slides
5