Chomsky John Backus and Peter Naur developed BNF which was used by Algol 60, C, Java, and Ada
Basic Forms of BNFs
Original BNF Extended BNF Syntax Diagrams
Lexical Structures of PL
Structures of its words (or tokens)
Separate from syntactic structure but it is closely related to and in some cases can be an inextricable part of syntax
Categories of Tokens
Reserved words (sometimes called
keywords) Literals or Constants Special symbols Identifiers
Some Issues on Tokens
Reserved words are so named bec.
Identifiers cannot have the same characters eg. double if Sometimes confusion can occur between reserved words and predefined variables Some PL identifiers have a fixed maximum size (some adopt the 1st six or eight characters)
Some Issues on Tokens
Problem arises on determining the end of a token eg. doif x12 Maximum Munch (principle of longest substring)- solution to identifying the end of the token/ it also means that intervening characters can make a difference
Some Issues on Tokens
Format of a program can affect the way tokens are recognized eg. Maximum munch requires token be separated by token delimiters/white spaces eg. Indentation can be used to determine structure such as the end of line of text Free Format Language- a language in which format has no effect on the program structure Fixed Format Language- all tokens must occur in pre-specified location
Token Conventions of the C
Lang.
The following is a quotation from the C
manual by Kernighan and Ritchie(1988)
There are six classes of tokens: identifiers, keywords,
constants, string literals, operators, and other separators. Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments as described below (collectively, white spaces) are ignored except as they separate tokens. Some white space is required to separate otherwise adjacent identifiers, keywords and constants. If the input stream has been separated into tokens up to a given character, the next token is the longest string of characters that could constitute a token.
Context-Free Grammars and
Number 1: Generate 1 valid sentence BNFs sentence noun-phrase verb-phrase . noun-phrase article noun article a | the noun girl | dog verb-phrase verb noun-phrase verb sees | pets Number 2: Using the G below, generate 1 valid expr such that /expr/ = 4 Expr expr + | expr * expr | (expr) | number Number number digit | digit Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Number 3: Discuss the pros and cons of ignoring or requiring white spaces when recognizing tokens