You are on page 1of 27

11-711 Algorithms for NLP

The Earley Parsing Algorithm

Reading: Jay Earley, An Efcient Context-Free Parsing Algorithm Comm. of the ACM vol. 13 (2), pp. 94102

The Earley Parsing Algorithm


General Principles: A clever hybrid Bottom-Up and Top-Down approach Bottom-Up parsing completely guided by Top-Down predictions Maintains sets of dotted grammar rules that: Reect what the parser has seen so far Explicitly predict the rules and constituents that will combine into a complete parse Similar to Chart Parsing - partial analyses can be shared Time Complexity
3

, but better on particular sub-classes

Developed prior to Chart Parsing, rst efcient parsing algorithm for general context-free grammars.
1 11-711 Algorithms for NLP

The Earley Parsing Method


Main Data Structure: The state (or item) A state is a dotted rule and starting position:
1
           

The algorithm maintains sets of states, one set for each position in the input string (starting from 0) We denote the set for position by
  

11-711 Algorithms for NLP

The Earley Parsing Algorithm


Three Main Operations: Predictor: If state rule of the form


, add to

then for every the state


 !  " 

 #    #

 



  % &  

Completer: If state in of form


&

Scanner: If state input word is 1


0  1

 $  

then for every state , add to the state


 !     

 

 

 ' ( $

 ( ' $ 

, then add to

and the next the state


 

  

)

! 

2 )

 1

 )   

11-711 Algorithms for NLP

The Earley Recognition Algorithm


Simplied version with no lookaheads and for grammars without epsilon-rules Assumes input is string of grammar terminal symbols We extend the grammar with a new rule $ for
3  4 

The algorithm sequentially constructs the sets 0 1


 6

We initialize the set

with

$ 0
 8

3 

11-711 Algorithms for NLP

The Earley Recognition Algorithm


The Main Algorithm: parsing input 1.
0 1
20  0 0 9  

$ 0
 8

@ 3 

2. For 0 do: Process each item in order by applying to it the single applicable operation among:


(a) Predictor (adds new items to

(b) Completer (adds new items to (c) Scanner (adds new items to 3. If 4. If
1
 

2  2  1 2 

7 5

and

, Reject the input


B

A!

) )
1

$ 0
 8

 

 1

then Accept the input

3  @

2  9 1

11-711 Algorithms for NLP

Earley Recognition - Example


The Grammar: 1 2
CD )FG )H CD )FG CD DE 

6
20

The original input: The large can can hold the water POS assigned input: art adj n aux v art n art adj n aux v art n $ Parser input:
20 20

CD )H DE )I 0 DE P CD DE

11-711 Algorithms for NLP

Earley Recognition - Example


The input: art adj n aux v art n $
20

0:

$ 0
 

@ 3 

 

CD

0


DE

 CD CD CD )FG )H )FG )H

  

0


0 0

 

1:

0


CD CD

)FG

)H

0


)FG

11-711 Algorithms for NLP

Earley Recognition - Example


The input: art adj n aux v art n $
20

1:

0


CD

 

)FG

)H

0


CD

)FG

2:

0


CD

)FG

)H

11-711 Algorithms for NLP

Earley Recognition - Example


The input: art adj n aux v art n $
20

2:

0


CD

 

)FG

)H

3:

0


CD

)FG

)H

11-711 Algorithms for NLP

Earley Recognition - Example


The input: art adj n aux v art n $
20

3:

0


CD

 

)FG

)H

0 3

CD

DE

  DE DE DE )I 0

 

3


CD  P

4:

3


DE

DE

)I

10

11-711 Algorithms for NLP

Earley Recognition - Example


The input: art adj n aux v art n $
20

4:

3 4

DE

DE

 

)I

DE DE

DE

)I

4


CD  P

5:

4


DE

CD

11

11-711 Algorithms for NLP

Earley Recognition - Example


The input: art adj n aux v art n $
20

5:

4


DE

CD

 

5


CD CD CD

)FG

)H

5 5

  )H )FG

 

6:

5


CD CD

)FG

)H

5


)FG

12

11-711 Algorithms for NLP

Earley Recognition - Example


The input: art adj n aux v art n $
20

6:

5


CD

 

)FG

)H

5


CD

)FG

7:

5


CD

)FG

13

11-711 Algorithms for NLP

Earley Recognition - Example


The input: art adj n aux v art n $
20

7:

5 4

CD

 

)FG

DE DE  3  @

CD

 DE )I 0 CD DE

3


0


$ 0
 

8:

0


3  @

14

11-711 Algorithms for NLP

Time Complexity of Earley Algorithm


Algorithm iterates for each word of input (i.e. iterations) ? ,
   

How many items can be created and processed in Each item in 0




has the form



Thus

The Scanner and Predictor operations on an item each require constant time The Completer operation on an item adds items of form to , with 0 , so it may require up 1 to time for each processed item
        '( $ ' 

items

&

Time required for each iteration ( ) is thus


 

Time bound on entire algorithm is therefore


15

11-711 Algorithms for NLP

Time Complexity of Earley Algorithm


Special Cases: Completer is the operation that may require iteration


2


time in

For unambiguous grammars, Earley shows that the completer time operation will require at most Thus time complexity for unambiguous grammars is For some grammars, the number of items in each bounded by a constant
  

is

These are called bounded-state grammars and include even some ambiguious grammars. For bounded-state grammars, the time complexity of the algorithm is linear

16

11-711 Algorithms for NLP

Parsing with an Earley Parser


As usual, we need to keep back-pointers to the constituents that we combine together when we complete a rule Each item must be extended to have the form , where the are pointers to the 1 1 already found RHS sub-constituents
        G   G

At the end - reconstruct parse from the back-pointers To maintain efciency - we must do ambiguity packing

17

11-711 Algorithms for NLP

Earley Parsing - Example


The input: art adj n aux v art n $
20

18

11-711 Algorithms for NLP

Earley Parsing - Example


The input: art adj n aux v art n $
20

0:

$ 0
 

@ 3 

 

CD

0


DE

 CD CD CD )FG )H )FG )H

  

0


0 0

 

1:

1 1

0


CD CD

)FG

)FG

)H 

)FG

0


19

11-711 Algorithms for NLP

Earley Parsing - Example


The input: art adj n aux v art n $
20

1:

1 1

0


CD

)H

 

)FG

0


CD

)FG

2:

0


CD

)FG

)H

)H

20

11-711 Algorithms for NLP

Earley Parsing - Example


The input: art adj n aux v art n $
20

2:

0


CD

 

)FG

)H

3:

0


3 4

CD

)FG

)H

CD )FG

)H

21

11-711 Algorithms for NLP

Earley Parsing - Example


The input: art adj n aux v art n $
20

3:

0


CD

 

)FG

)H

4
)I

0 3

DE

CD

 DE DE

 DE 0

 

3


CD P 

4:

3


DE

DE

)I

)I

22

11-711 Algorithms for NLP

Earley Parsing - Example


The input: art adj n aux v art n $
20

4:

3


DE

DE

 

)I

4


DE DE

DE

)I

4


CD  P

5:

4


DE

CD

23

11-711 Algorithms for NLP

Earley Parsing - Example


The input: art adj n aux v art n $
20

5:

6
)FG

4


CD

DE

 

5


CD CD CD

)H

5 5

  )H )FG

 

6:

7 7

5


CD CD

)FG

)FG

)H 

)FG

5


24

11-711 Algorithms for NLP

Earley Parsing - Example


The input: art adj n aux v art n $
20

6:

7 7

5


CD

)H

 

)FG

5


CD

)FG

7:

5


8 9

CD

)FG

CD )FG

25

11-711 Algorithms for NLP

Earley Parsing - Example


The input: art adj n aux v art n $
20

7:

7 6

10 11 12

9
DE

5 4

CD

 

)FG

10

3


11


0


10 11 12

9 10
DE

DE DE  3  @

CD

DE

CD

CD  )I

DE

$ 0


)I

11

DE

CD

DE

8:

0


3  @

26

11-711 Algorithms for NLP

You might also like