You are on page 1of 13

SYNTAX

Reynald Jay Fernandez Hidalgo, MSCS


Professor

Syntax

It is the structure of a language


Can be described using formal systems

Formal Systems of Describing


Syntax

Context-free grammars developed by


Chomsky
John Backus and Peter Naur developed
BNF which was used by Algol 60, C, Java,
and Ada

Basic Forms of BNFs

Original BNF
Extended BNF
Syntax Diagrams

Lexical Structures of PL

Structures of its words (or tokens)


Separate from syntactic structure but it
is closely related to and in some cases
can be an inextricable part of syntax

Categories of Tokens

Reserved words (sometimes called


keywords)
Literals or Constants
Special symbols
Identifiers

Some Issues on Tokens

Reserved words are so named bec.


Identifiers cannot have the same
characters
eg. double if
Sometimes confusion can occur between
reserved words and predefined variables
Some PL identifiers have a fixed
maximum size (some adopt the 1st six or
eight characters)

Some Issues on Tokens


Problem arises on determining the end
of a token
eg. doif
x12
Maximum Munch (principle of longest
substring)- solution to identifying the
end of the token/ it also means that
intervening characters can make a
difference

Some Issues on Tokens


Format of a program can affect the way
tokens are recognized
eg. Maximum munch requires token be
separated by token delimiters/white spaces
eg. Indentation can be used to determine
structure such as the end of line of text
Free Format Language- a language in which
format has no effect on the program structure
Fixed Format Language- all tokens must occur
in pre-specified location

Token Conventions of the C


Lang.

The following is a quotation from the C


manual by Kernighan and Ritchie(1988)

There are six classes of tokens: identifiers, keywords,


constants, string literals, operators, and other separators.
Blanks, horizontal and vertical tabs, newlines, formfeeds,
and comments as described below (collectively, white
spaces) are ignored except as they separate tokens.
Some white space is required to separate otherwise
adjacent identifiers, keywords and constants. If the input
stream has been separated into tokens up to a given
character, the next token is the longest string of characters
that could constitute a token.

Context-Free Grammars and


Number 1: Generate 1 valid sentence
BNFs
sentence noun-phrase verb-phrase .
noun-phrase article noun
article a | the
noun girl | dog
verb-phrase verb noun-phrase
verb sees | pets
Number 2: Using the G below, generate 1 valid expr
such that /expr/ = 4
Expr expr + | expr * expr | (expr) | number
Number number digit | digit
Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Number 3: Discuss the pros and cons of ignoring or requiring white spaces
when recognizing tokens

Why Context-Free

Non-terminal must appear singly on the


left hand side of productions

ISO 14977

ISO standard for BNF notation adopted in


1996
eg. sentence = Noun phrase, verb
phrase .;

You might also like