You are on page 1of 51

Simplifications

of
Context-Free Grammars

Fall 2004 COMP 335 1


A Substitution Rule

Equivalent
grammar
S → aB
S → aB | ab
A → aaA
Substitute A → aaA
A → abBc B→b A → abBc | abbc
B → aA
B → aA
B→b
Fall 2004 COMP 335 2
A Substitution Rule
S → aB | ab
A → aaA
A → abBc | abbc
B → aA
Substitute
B → aA
S → aB | ab | aaA
Equivalent
A → aaA
A → abBc | abbc | abaAc
grammar
Fall 2004 COMP 335 3
In general:
A → xBz

B → y1

Substitute
B → y1

equivalent
A → xBz | xy1z grammar
Fall 2004 COMP 335 4
Nullable Variables

λ − production : A→λ

Nullable Variable: A ⇒K⇒ λ

Fall 2004 COMP 335 5


Removing Nullable Variables

Example Grammar:

S → aMb
M → aMb
M →λ

Nullable variable

Fall 2004 COMP 335 6


Final Grammar

S → aMb
S → aMb
Substitute S → ab
M → aMb M →λ
M → aMb
M →λ
M → ab

Fall 2004 COMP 335 7


Unit-Productions

Unit Production: A→ B

(a single variable in both sides)

Fall 2004 COMP 335 8


Removing Unit Productions

Observation:

A→ A

Is removed immediately

Fall 2004 COMP 335 9


Example Grammar:

S → aA
A→a
A→ B
B→A
B → bb

Fall 2004 COMP 335 10


S → aA
S → aA | aB
A→a
Substitute A→a
A→ B A→ B B → A| B
B→A
B → bb
B → bb

Fall 2004 COMP 335 11


S → aA | aB S → aA | aB
A→a Remove A→a
B → A| B B→B B→A
B → bb B → bb

Fall 2004 COMP 335 12


S → aA | aB
S → aA | aB | aA
A→a Substitute
B→A A→a
B→A
B → bb
B → bb

Fall 2004 COMP 335 13


Remove repeated productions

Final grammar
S → aA | aB | aA S → aA | aB
A→a A→a
B → bb B → bb

Fall 2004 COMP 335 14


Useless Productions

S → aSb
S →λ
S→A
A → aA Useless Production

Some derivations never terminate...

S ⇒ A ⇒ aA ⇒ aaA ⇒ K ⇒ aa K aA ⇒ K
Fall 2004 COMP 335 15
Another grammar:

S→A
A → aA
A→λ
B → bA Useless Production

Not reachable from S

Fall 2004 COMP 335 16


In general: contains only
terminals
if S ⇒ K ⇒ xAy ⇒ K ⇒ w

w∈ L(G )

then variable A is useful

otherwise, variable A is useless

Fall 2004 COMP 335 17


A production A → x is useless
if any of its variables is useless

S → aSb
S →λ Productions
Variables S→A useless

useless A → aA useless
useless B→C useless

useless C→D useless


Fall 2004 COMP 335 18
Removing Useless Productions

Example Grammar:

S → aS | A | C
A→a
B → aa
C → aCb

Fall 2004 COMP 335 19


First: find all variables that can produce
strings with only terminals

S → aS | A | C Round 1: { A, B}
A→a S→A
B → aa
C → aCb Round 2: { A, B, S }

Fall 2004 COMP 335 20


Keep only the variables
that produce terminal symbols: { A, B, S }
(the rest variables are useless)

S → aS | A | C
A→a S → aS | A
B → aa A→a
C → aCb B → aa
Remove useless productions
Fall 2004 COMP 335 21
Second: Find all variables
reachable from S

Use a Dependency Graph

S → aS | A
A→a S A B
B → aa not
reachable

Fall 2004 COMP 335 22


Keep only the variables
reachable from S
(the rest variables are useless)

Final Grammar
S → aS | A
S → aS | A
A→a
A→a
B → aa

Remove useless productions

Fall 2004 COMP 335 23


Removing All

Step 1: Remove λ-productions

Step 2: Remove Unit-productions

Step 3: Remove Useless productions

Fall 2004 COMP 335 24


Normal Forms
for
Context-free Grammars

Fall 2004 COMP 335 25


Chomsky Normal Form

Each productions has form:

A → BC or A→a

variable variable terminal

Fall 2004 COMP 335 26


Examples:

S → AS S → AS
S →a S → AAS
A → SA A → SA
A→b A → aa
Chomsky Not Chomsky
Normal Form Normal Form

Fall 2004 COMP 335 27


Conversion to Chomsky Normal Form

Example: S → ABa
A → aab
B → Ac

Not Chomsky
Normal Form

Fall 2004 COMP 335 28


Introduce variables for terminals: Ta , Tb , Tc

S → ABTa
S → ABa A → TaTaTb
A → aab B → ATc
B → Ac Ta → a
Tb → b
Tc → c
Fall 2004 COMP 335 29
Introduce intermediate variable: V1

S → AV1
S → ABTa
V1 → BTa
A → TaTaTb
A → TaTaTb
B → ATc
B → ATc
Ta → a
Ta → a
Tb → b
Tb → b
Tc → c
Tc → c
Fall 2004 COMP 335 30
Introduce intermediate variable: V2
S → AV1
S → AV1
V1 → BTa
V1 → BTa
A → TaV2
A → TaTaTb
V2 → TaTb
B → ATc
B → ATc
Ta → a
Ta → a
Tb → b
Tb → b
Tc → c
Fall 2004 COMP 335 Tc → c 31
Final grammar in Chomsky Normal Form:
S → AV1
V1 → BTa
A → TaV2
Initial grammar
V2 → TaTb
S → ABa B → ATc
A → aab Ta → a
B → Ac Tb → b
Fall 2004 COMP 335
Tc → c 32
In general:

From any context-free grammar


(which doesn’t produce λ )
not in Chomsky Normal Form

we can obtain:
An equivalent grammar
in Chomsky Normal Form

Fall 2004 COMP 335 33


The Procedure

First remove:

Nullable variables

Unit productions

Fall 2004 COMP 335 34


Then, for every symbol a:

Add production Ta → a

In productions: replace a with Ta

New variable: Ta
Fall 2004 COMP 335 35
Replace any production A → C1C2 LCn

with A → C1V1
V1 → C2V2
K
Vn−2 → Cn−1Cn

New intermediate variables: V1, V2 , K,Vn−2


Fall 2004 COMP 335 36
Theorem: For any context-free grammar
(which doesn’t produce λ )
there is an equivalent grammar
in Chomsky Normal Form

Fall 2004 COMP 335 37


Observations

• Chomsky normal forms are good


for parsing and proving theorems

• It is very easy to find the Chomsky normal


form for any context-free grammar

Fall 2004 COMP 335 38


Greibach Normal Form

All productions have form:

A → a V1V2 LVk k ≥0

symbol variables

Fall 2004 COMP 335 39


Examples:

S → cAB
S → abSb
A → aA | bB | b
S → aa
B→b

Greibach Not Greibach


Normal Form Normal Form

Fall 2004 COMP 335 40


Conversion to Greibach Normal Form:

S → aTb STb
S → abSb S → aTa
S → aa Ta → a
Tb → b
Greibach
Normal Form
Fall 2004 COMP 335 41
Theorem: For any context-free grammar
(which doesn’t produce λ )
there is an equivalent grammar
in Greibach Normal Form

Fall 2004 COMP 335 42


Observations

• Greibach normal forms are very good


for parsing

• It is hard to find the Greibach normal


form of any context-free grammar

Fall 2004 COMP 335 43


The CYK Parser

Fall 2004 COMP 335 44


The CYK Membership Algorithm

Input:

• Grammar G in Chomsky Normal Form

• String w

Output:

find if w∈ L(G )
Fall 2004 COMP 335 45
The Algorithm
Input example:
• Grammar G : S → AB
A → BB
A→a
B → AB
B→b

• String w : aabbb
Fall 2004 COMP 335 46
aabbb
a a b b b

aa ab bb bb

aab abb bbb

aabb abbb

aabbb

Fall 2004 COMP 335 47


S → AB
A → BB
a a b b b
A→a A A B B B
B → AB
aa ab bb bb
B→b
aab abb bbb

aabb abbb

aabbb
Fall 2004 COMP 335 48
S → AB
A → BB
a a b b b
A→a A A B B B
B → AB
aa ab bb bb
B→b S,B A A
aab abb bbb

aabb abbb

aabbb
Fall 2004 COMP 335 49
S → AB
a a b b b
A → BB
A A B B B
A→a
aa ab bb bb
B → AB
S,B A A
B→b aab abb bbb
S,B A S,B
aabb abbb
A S,B
aabbb
S,B
Fall 2004 COMP 335 50
Therefore: aabbb ∈ L(G )

3
Time Complexity: | w|

Observation: The CYK algorithm can be


easily converted to a parser
(bottom up parser)
Fall 2004 COMP 335 51

You might also like