You are on page 1of 13

Language Fundamentals-I

(Character set, keywords, identifiers, constants, variables)

First, master the fundamentals. Larry Bird

Chapter Outline Chapter Outline Introduction Introduction Character Set Character Set Tokens Tokens Keywords Keywords Identifiers Identifiers Literals Literals Data types Data types Variables Variables Type qualifiers Type qualifiers Conclusion Conclusion

I long to accomplish great and noble task, but it is my chief duty to accomplish small tasks as if they were great and noble. Helen Keller

Success is neither magical nor mysterious. Success is the natural consequence of consistently applying the basic fundamentals. Jim Rohn

2.1. Introduction
Alphabets Words Sentences Paragraphs Stories

Steps in learning English Language

Characterset

Tokens

Instructions

Functions

Programs

Steps in learning C Language

Character Set: A character denotes any alphabet, digit, white space or any special symbol that is used to represent information. A character set is collection of characters. Token: A token is the smallest individual unit of a program. Instruction: An instruction is a statement that is given to computer to perform a specific operation. Function: A function is a collection of instructions that performs a particular task. Program: A program is a well-organized collection of instructions that is used to communicate with the computer system to accomplish desired objective.

2.2. Character Set


When we wish to write a program, we write it as a collection of text lines containing characters from a collection of characters. This collection can be called as character set. A C program can be written using the following character set: abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ Digits: 0123456789 Special Symbols: Symbol Meaning Symbol Meaning { Opening curly apostrophe brace } Closing curly brace Double quotation mark ( Opening ~ Negation or tilde parenthesis ) Closing parenthesis ! Exclamation [ Opening square # Pound or number bracket or hash ] Closing square % mod bracket . Dot or period ; Semi-colon ? Question Mark : Colon | Pipe , Comma _ Underscore = Assigns to White space characters: blank space, horizontal tab, new line, carriage return, vertical tab, form feed Alphabets:

Symbol ^ & * + / \ > <

Meaning Caret or exclusive OR Ampersand Asterisk Plus Minus or hyphen Forward slash Backward slash Greater than Lesser than

2.3. Tokens
A token is the smallest individual unit (or element) of a program. The tokens used in a program are: Keywords. Identifiers. Literals (constants). Variables. Operators.

2.3.1.

Keywords

Each language comes with a set of words. As these words play key role in developing a program, these are often termed as keywords. Keywords Keywords are are the the built-in built-in words words whose whose meanings meanings are are already already explained explained to to compiler. compiler. Keywords are the pre-defined or built-in words. Each keyword has its own definition that is defined by the language developers. A C compiler can recognize keyword and replaces its definition whenever it is needed. Keywords also called as reserved words. Each keyword has its own purpose and it should be used only for that purpose. There are 3 types of keywords: Keywords

Type-related keywords int float char double long short signed unsigned void struct union typedef sizeof enum const volatile

Storage-related keywords auto static extern register

Control-flow related keywords if else switch default case while do for break continue goto return

It is important to note that all the keywords should be in lowercase. Some compilers may also include some or all of the following keywords: ada asm entry far

fortran

huge

near

pascal

Q U E S T I O N S

1. Which of the following are keywords in C? a) int b) register c) switch d) boolean

2. Which of the following is a keyword in C? a) Int b) int c) Integer d) integer

3. Which of the following is not a keyword in C? a) volatile b) enum c) constant d) sizeof

2.3.2. Identifiers
As pre-defined names (i.e., keywords) are needed to develop a C program, User-defined names also needed. These user-defined names are called as identifiers. Identifiers Identifiers are are the the names names given given to to various various program program elements elements such such as as variables, variables, constants, constants, arrays, functions, pointersetc. arrays, functions, pointersetc. These names will be given by the user as and when needed. Giving meaningful identifiers makes program easy to understand. To define an identifier one should follow these rules: Rule #1: An identifier should not be a keyword. Ex: salary, name, GOTO int, goto (valid) (invalid)

Rule #2: The first character of identifier must be a letter or an underscore (_). The subsequent characters may be alphabets, digits or underscore. Special symbols are not allowed. Ex: pradeep 1Raj Ex: gross_sal Gross salary K123 Ravi_varma _abc (valid) (invalid) Rect_area s.i. (valid)

Rule #3: No special symbol is used except underscore (_). No spaces are allowed in an identifier. profit&loss (invalid)

Rule #4: Upper and lower case letters in an identifier are distinct (or different). Ex: The names amount, Amount, aMOUnt and AMOUNT are not the same identifiers. Rule #5: An identifier can be arbitrarily long. Some implementations of C recognize only the first eight characters, though most compilers recognize more (typically, 31 characters).

Q U E S T I O N S

1) Which of the following are valid and invalid identifiers? Give reasons if not valid. 1) record1 2)$tax 3)name 4)name-and-address 5) 1record 6) name and address 7) name_and_address 8) 123-45-6789 9) return 10)file_3 11)_master 12)_123 13) Ravi&Bro. 2) Assume that your C compiler recognizes only first 8 characters of an identifier. Which of the following are valid and invalid identifiers? 1) Master_minds 2)char 3)s.i. 4) SimpleInterest 5)string 6)char1 7) identifier_1 8)ANSWER 9)answer 10)number#1

2.3.3. Constants (or) Literals


A program needs some input to be processed. During processing instructions, the input should be stored in memory. The input that is being stored in memory called as a literal. A A constant constant or or literal literal is is a a value value that that is is being being input input by by the the user user to to a a program. program. The The value value may may be a character, a string, an integer or a floating-point number. be a character, a string, an integer or a floating-point number. There are two types of constants : Numeric constants and non-numeric constants. As the names imply that numeric constant is collection of digits and non-numeric constant is collection of characters from character set. Constants

Numeric Constants

Non-numeric Constants

Integer constants

Real constants

Character constants

String constants

Note: The word constant in C has two meanings: 1. The value that remains unchanged (or fixed) during the execution of program. 2. The value that is being input to a program.

2.3.3.1.

Integer constants

An integer constant (either positive or negative) is taken to be: 1. Decimal integer constant: if it consists of digits 0-9. E.g., 98334 E.g., 0534 -3456 035 are valid decimal integer constants. are valid octal integer constants. 2. Octal integer constant: if it begins with 0 (digit 0) and should not contain 8 and 9. 3. Hexa-decimal integer constant: if the sequence of digits should be preceded by 0x (or) 0X and should hold the values from 0-9 (or) A-F (a-f). E.g., 0xFACE 0X124c are valid hexa-decimal integer constants.

An integer constant may be suffixed by the letter u (or) U, to specify that it is unsigned (only positive). It may also be suffixed by the letter l or L to specify that it is long (big integer). In the absence of any suffixes, the data type of an integer constant is derived from its value. Examples of integer constants: Integer constant 5000U 123456789L 0235353l 0x23FA3dU 0XFFFFFFFUL 0243UL 123245353UL Description Unsigned decimal integer constant Long decimal integer constant Long octal integer constant Unsigned hexa decimal integer constant Unsigned long hexa-decimal integer constant Unsigned long octal integer constant Unsigned long decimal integer constant

2.3.3.2.

Floating-point constants or Real constants

A floating-point constant can be expressed in any one of these two notations: Decimal notation: In this notation, the floating-point number is represented as a whole number followed by a decimal point and a fractional part. It is possible to omit digits before or after the decimal point. A floating-point constant can include one of the suffixes: f, F or l, L. Examples of floating-point constants: (Decimal notation) Real constant 2.3456 2.3456F 2.3456L Description Double-precision floating-point constant. (by default) Single-precision floating-point constant. Long double precision floating-point constant. Precision= digits after a decimal point.

Exponential notation: Exponential notation is useful in representing numbers whose magnitudes are very large or very small. The exponential notation consists of a mantissa, e or E and an exponent. The mantissa is either an integer or a real number expressed in decimal notation. A mantissa can be preceded by a sign (+ or -). The exponent is an integer preceded by an optional sign. Examples of floating point constants: (Exponential notation) Number 53876 0.00000000004 100000 0.007321 32000 0.0000005 Note: In powers of 10 5.3876*104 4*10-11 1*105 7.321*10-3 3.2*104 0.5*10-6 Exponential notation 5.3876e4 4E-11 1e+5 7.321E-3 3.2E4 0.5E-6

It should be understood that integer constants are exact quantities; where as floating-point

constants are approximations. We should understand that the floating-point constant 1.0 might be represented within computers memory as 0.99999999.., even though it might appear as 1.0 when it is displayed on the screen (because of automatic rounding). Therefore, floating-point values can not be used for certain purposes, such as counting, indexingetc, where the exact values are required. 2.3.3.3. Character constants

A character constant is a sequence of one or more characters enclosed in single quotes. The character may be an alphabet, digit, special symbol or a blank space. The value of a character constant with only one character is the numeric values of the character in the machines character set at execution time. The value of multi-character constant is implementation-defined. Ex: a abc 9 @ 123 \0 (valid) a&b (invalid)

It is important to note that character constants do not contain the (single quote character) or new line within it. In order to represent these and certain other characters, the following escape sequences (or backslash character constants) may be used:

Backslash character constant \n \t \v \b \r \f \a \\ \? \ \ \000 \xhh

Description

New line Horizontal tab Vertical tab Back space Carriage return Form feed Audible alert (bell) Backslash Question mark Single quote Double quote Octal number Hexa-decimal number

The escape sequence \000 consists of the backslash followed by 1, 2 or 3 octal digits which are taken to specify the value of a desired character. A common example of this construction is \0 (not followed by any digit), which specifies the character NUL. The escape sequence \xhh consists of backslash followed by x, followed by hexa-decimal digits, which are taken to specify the value of the desired character. There is no limit on the number of digits, but the behavior is undefined if the resulting character value exceeds that of largest character. 2.3.3.4. String constants

A string constant is a sequence of characters surrounded by double quotes. The characters may be of letters, numbers, escape sequences and spaces. In C, a string can be represented as an array of characters terminated by a null character (\0). Ex: 234 Note: 1) A string constant never contain the characters: (double quotation mark), new line. To include this, one should use their corresponding escape sequences. 2) Adjacent string literals are concatenated into a single string. After concatenation, a null byte \0 is appended to the string so that program that reads the string can find its end. Civil \n Engineering (invalid) Rama&Co. (valid)

Interview question #1 What is the difference among 1, 1 and 1? 1 is a decimal integer constant that occupies 2 or 4 bytes based on execution environment (i.e, on processor and compiler). 1 is a character constant that occupies 1 byte containing the ASCII code of the character 1. 1 is a string constant that occupies 2 bytes; one byte containing ASCII code of character 1 and one byte for null character with value 0 that shows end of string.

Q U E S T I O N S
2.3.4.

1) Which of the following are valid and invalid Integer constants? Give reasons if not valid. 1) 123.34 2) 0893 3)-2345 4)0x123 5)3458UL 6)2345l 7)0124 8)0XFAGE 2) Which of the following are valid and invalid floating-point constants? Give reasons if not valid. 1) -934 2) 0345 3)-89.34 4)9E+3 5)67.84L 6)89.342f 7)0.3E-4 8)89. 9).89 3) Which of the following are valid and invalid character constants? Give reasons if not valid. 1) a 2) { 3)0 4) 5)\m 6)\023 7)\x3456 8), 9)134.3 10)435 4) Which of the following are valid and invalid string constants? Give reasons if not valid. 1) Master minds 2) 234-567-466 3)King & queen 4)C is brilliant 5)he told- I miss you 6)Ravis friend

Data types
Data Data type type is is a a classification classification or or category category of of various various types types of of data data that that states states the the possible possible values values that that can can be be taken, taken, how how they they are are stored stored and and what what operations operations are are allowed allowed on on them. them.

In simple terms, data type is a set of values and operations on those values. 2.3.4.1. Primitive data types: There are 5 basic data types in C. The size and range of each of these data types may vary among processor types and compilers. The following table shows the primitive data types in C: Data type int float double char void Size (in bytes) 2 bytes or one word (varies from one compiler to another). 4 bytes or one word. 8 bytes or two words. 1 byte. 0 bytes. Range -32768 to +32767 -3.4e38 to +3.4e38 with 6 digits of precision. -1.7e308 to +1.7e308 with 10 digits of precision -128 to +127 Valueless

Type modifiers: Except the data type void, the primitive data types may have various type modifiers preceding them. Type modifiers are the keywords that are used to modify the behavior of existing primitive data types. There are two types of modifiers: Size modifiers: These type modifiers modify the number of bytes a primitive data type occupies. Based on size, the maximum and minimum values, a primitive data type specifies, will be changed. The size modifiers include: long and short.

Sign modifiers: These type modifiers modify the sign of a primitive data type. The sign modifiers
include: signed and unsigned. Size modifiers: A compiler can decide appropriate sizes depending on operating system and hardware for which it is being written, subject to following rules: a) shorts are atleast 2 bytes long. b) longs are atleast 4 bytes long. c) shorts are never bigger than ints. d) ints are never bigger than longs. compiler 16-bit 32-bit short 2 2 int 2 4 long 4 4

Sign modifiers: if unsigned type modifier is preceding a primitive data type, then the variables of the specified type accept only positive values. If signed type modifier is preceding a primitive data type, then the variables of specified type accept both positive and negative values. The following table specifies various data types including type modifiers: (16-bit compiler)

Data type char / signed char unsigned char int / signed int / short int/ signed short int unsigned int / unsigned short int long int / signed long int unsigned long int float double long double

Size (in bytes) 1 1 2 2 4 4 4 8 10

Range -128 to +127 0 to 255 -32768 to +32767 0 to 65535 -2,147,483,648 to +2,147,483,647 0 to 4,294,967,295 -3.4e38 to +3.4e38 with 6 digits of precision. -1.7e308 to +1.7e308 with 10 digits of precision. -1.7e4932 to +1.7e4932 with 10 digits of precision.

2.3.4.2. User-defined data types: These are the data types defined by the user according to his needs. These data types will be defined by using primitive data types. The user-defined data types include: struct, union, enum.

2.3.5. Variables
The value that is being input to a program will be held by some entity known as a variable. This variable associates with locations in memory, based on the type of input. E.g., if input is a floating-point value, then the variable of type float associates with 4 bytes. Each of these bytes has its associated address. However, it is a good idea to name these locations by avoiding the headache of remembering addresses. Therefore, A A variable variable is is a a named named location location in in memory memory that that holds holds a a value value and and that that value value may may be be varied varied during during execution execution of of a a program. program. Ex: f=1.8*c+32 In this formula, 1.8 and 32 are fixed values means that they dont change each time. Each time the values of f and c are changed. Hence, f and c will be treated as variables. Declaring a variable: (Declarative instruction) All variables should be declared before we use them in the program. The variable declaration tells the compiler two things: 1. What the name(s) of variables are? 2. Where the values are being stored? Usually, the declarative instruction is written as a first statement before all the executable statements. The declarative instruction has the following syntax: [Storage class] <data type> <variable name(s)>;

In this syntax, The content in square brackets is optional. The content in angle brackets is mandatory. There should be spaces in between. The declarative instruction should always be ended with a semi-colon. The storage class specifies the default value a variable(s) holds, storage location of variable(s), scope and life time of variable(s). These include: auto, extern, register, static. The data type is a keyword that specifies the type of data that is being hold by the variable(s). The variable name is any legal identifier. In other words, it should be built based on the rules of identifier. If there are more than one variable of the same type, then separate them with commas. Ex: int a,b,c; long double m; char sex; //a,b and c are integer variables // m is a long double variable.

// sex is a character variable //name is a character array that can hold 19 characters.

char name[20];

Initializing a variable: Initializing a variable is the process of assigning a value to the variable. The initialization can be done as follows: [Storage class] <variable name>=<value>; (or) <variable Name>=<value>; .(2) <data type> .(1)

In these two syntaxes, we observe an operator, i.e., assignment operator (=), which is used to assign a value of Right operand to Left operand. In the second syntax, the variable name should be declared earlier. Ex: int a=20; int a; a=20; While initializing a variable, one should observe this: If the variable and assigned values are of different types, then the assigned value will be converted to type of variable. E.g., float a=20; In this example, a is floating-point variable and 20 is an integer. When this initialization is carried out, the variable a holds 20.000000, a floating-point value. 1. Write appropriate declarations for each group of variables and arrays a) Integer variables: p,q Floating-point variables: x,y,z Character variables: a,b,c b) Long integer variable: counter is equivalent to

Q U E S T I O N S

Short integer variable: flag Unsigned integer variable: cust_no c) Double-precision variables: gross, tax, net 80-element character array: message 2. Write the declarations of various variables required for calculating simple interest? 3. Write appropriate initialization statements for these: (initial value) a) Integer variable: a Floating point variable: x b) character variable: c 15-element character array: name fees)? (120) (32.34f) (p) (pradeep)

4. Write appropriate initialization statements for your details (rollno, name, age, sex,

2.3.6.

Type qualifiers

Type qualifiers are the keywords that add new meanings to existing data types. There are two type qualifiers: const, volatile. Making a variable as read-only variable: In order to make the value of variable as unchanged during the execution of a program, initialize the variable with the type qualifier const as follows: const [Storage class] Ex: const double PI=3.1412; This initialization tells the compiler that the value of PI must not be modified by the program. However, it can be used on the right hand side of assignment statement like other variable. Ex: double x; x=PI; Making a variable as modifiable externally: In order to make variables value modifiable at any time by some external sources (from outside program), we use type qualifier volatile. For example, volatile int x; The value of x may be altered by some external factors even if it does not appear on the left-hand side of an assignment statement. When we declare a variable as volatile, the compiler will examine the value of the variable each time it is encountered to see whether any external alteration has changed the value. <data type> <variable name>=<value>;

2.4.

Conclusion

Every C program is typically a collection of functions. A function is a collection of instructions that perform a specific task. Some of instructions in functions made up of words and characters. These are collectively known as tokens. Hence, tokens are the smallest individual units of a program.

You might also like