You are on page 1of 33

Relational Algebra

Introduction to Relational Database


The relational database model originated from the mathematical concepts of a relation and set theory. It was first proposed as an approach to data modeling by Dr. Edgar F. Codd of IBM research in 1970 in his paper entitled A Relational Model of Data for Large Shared Data Banks.
The Relational database became operational only in mid-1980s. The main reasons for the delay in development and implementation of the model were:

Need to develop efficient implementation of simple relational operations. Need for automatic query optimization. Unavailability of efficient software techniques. Requirement of increased processing power. Requirement of increased I/O speed.

Structure of Relational Database


Structure of relational database includes the following basic things.
Relation can be viewed as two dimensional table.

Attribute a fixed number of columns Tuples a variable number of rows Cardinality total number of rows Degree total number of attributes Domain a set of atomic values Key unique identifier

Relational Algebra
Relational algebra is a collection of operations for the relational model to manipulate or access relations. These operations enable a user to specify retrieval requests. The result of a retrieval is new relation, which may have been formed from one or more relations. The algebra operations thus produce new relations, which can be further manipulated using operations of the same algebra.

Unary Relational Operation - SELECT


SELECT operation is used to select a subset of the tuples from a relation that satisfy a selection condition. It is a filter that keeps only those tuples that satisfy a qualifying condition, other tuples are discarded. The general form of the SELECT operation is given as: SELECT table(orrelation)name<wherepredicate(s)> Into RESULT In general, the select operation is denoted by

<selection condition> (R) The symbol (sigma) is used to denote the SELECT operator.
The selection condition is a Boolean expression specified on the attributes of relation R.

Unary Relational Operation - SELECT


SELECT Operation Properties. The SELECT operation <selection condition> (R) produces a relation S that has the same schema as R. The SELECT operation is commutative.

<condition1>( <condition2>

(R)) =

<condition2> ( <condition1>
( R)))

(R))

A cascaded SELECT operation may be applied in any order.

<condition1> ( <condition2> ( <condition3> ( R))) = <condition2> ( <condition3> ( <condition1>

A cascaded SELECT operation may be replaced by a single selection with a conjunction of all the conditions

<condition1>( <condition2>( <condition3> (R))) = <condition1> AND <condition2> AND <condition3>

(R)

Unary Relational Operation - PROJECT


PROJECT operation selects certain columns from the table and discards the other columns.

The general form of the PROJECT operation is given as: PROJECT table(orrelation)nameONcolumnname(s) Into RESULT
In general, the project operation is denoted by

<attribute list> (R)


The symbol (pi) is used to represent the PROJECT operator. <attribute list> is the desired list of attributes from the attributes of relation R. Example: To list each employees first name , last name and salary, the following query is used:

LNAME, FNAME,SALARY

(EMPLOYEE)

Unary Relational Operation - PROJECT


PROJECT Operation Properties The project operation removes any duplicate tuples so the result of the project operation is a set of tuples and hence a valid relation (i.e., duplicate elimination ) The number of tuples in the result of projection <list> (R) is always less or equal to the number of tuples in R. If the list of attributes includes a key of R, then the number of tuples is equal to the number of tuples in R. As long as <list2> contains the attributes in <list1>

( <list2> (R)) = <list1> (R) (i.e., PROJECT operation is not commutative.)


<list1>

Unary Relational Operation - RENAME


The RENAME operator is (rho). The general RENAME operation can be expressed by any of the following forms: S (B1,B2,..Bn) (R) is a renamed relation S based on R with column names B1,B2,Bn . S (R) is a renamed relation S based on R (which does not specify column names) (B1,B2,..Bn) (R) is a renamed relation with column names B1,B2,Bn which does not specify a new relation name.

Relational Algebra Operations from Set Theory- UNION


The result of this operation, denoted by R S, is a relation that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are eliminated. The two operands must be type compatible. The general form of UNION is given as: UNIONtablename1,tablename2 IntoRESULT Example: To retrieve the social security numbers of all employees who either work in department 5 or directly supervise an employee who works in department 5, we can use the union operation as follows: DEP5_EMPS DNO=5 (EMPLOYEE) RESULT1

SSN (DEP5_EMPS) RESULT2(SSN) SUPERSSN (DEP5_EMPS)


RESULT RESULT1 RESULT2

Relational Algebra Operations from Set Theory INTERSECTION & MINUS


INTERSECTION Operation The result of this operation, denoted by R S, is a relation that includes all tuples that are in both R and S. The two operands must be type compatible. The general form of UNION is given as: INTERSECTIONtablename1,tablename2 IntoRESULT DIFFERENCE (or MINUS) Operation The result of this operation, denoted by R - S, is a relation that includes all tuples that are in R but not in S. The two operands must be type compatible . The general form of DIFFERENCE is given as: DIFFERENCEtablename2,tablename1 IntoRESULT

Relational Algebra Operations from Set Theory


Type Compatibility The operand relations R1 ( A1, A2, , An) and R2 (B1, B2,, Bn) must have the same number of attributes. The domains of corresponding attributes must be compatible dom ( Ai ) = dom ( Bi ) for i=1,2,, n The resulting relation for R1 U R2, R1 R2 , or R1 R2 has the same attribute names as the first operand relation R1.

Relational Algebra Operations from Set Theory


Notice that both UNION and INTERSECTION operations are commutative. R S = R S and R S = S R Both UNION and INTERSECTION can be treated as n-ary operations applicable to any number of relations as both are associative. R ( S T ) = ( R S ) T and R ( S T) = (R S) T The MINUS operation is not commutative. RS SR

Relational Algebra Operations from Set Theory CARTESIAN PRODUCT


Cartesian Product is used to combine tuples from two relations in a combinatorial fashion. The resulting relation takes one tuple from first relation and combine this tuple with all tuples in second relation. Hence, if R has nR tuples and S has nS tuples, then R x S will have nR * nS tuples. Cartesian product is denoted by R(A1, A2, ..., An) x S(B1, B2, ..., Bm) whose result is a relation Q with degree n + m attributes: The order is Q(A1, A2, ..., An, B1, B2, ..., Bm). The two operands do NOT have to be type compatible.

Relational Algebra Operations from Set Theory CARTESIAN PRODUCT


Generally, CROSS PRODUCT is NOT a meaningful operation. In this example, EMP_DEPENDENTS will contain every combination of EMPNAMES and DEPENDENT whether or not they are actually related. Can become meaningful when followed by other operations. Example: FEMALE_EMPS

EMPNAMES EMP_DEPENDENTS EMPNAMES DEPENDENT ACTUAL_DEPENDENTS SSN=ESSN (EMP_DEPENDENTS) RESULT


FNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPENDENTS)

SEX=F (EMPLOYEE) FNAME, LNAME, SSN (FEMALE_EMPS)

Binary Relational Operations - JOIN


The sequence of a CARTESIAN PRODUCT followed by a SELECT is used quite commonly to identify and select related tuples from two relations. A special operation, called JOIN (denoted by ), which is very important for any relational database since it allows us to process relationships among relations. The general form of JOINING operation is given as: JOIN table(relation)name With table(relation)name ON(orOVER)domainname Into RESULT In general, the join operation on two relations R(A1,A2, ..., An) and S(B1, B2, ..., Bm) is denoted by: R
<join condition>

where R and S can be any relations.

Binary Relational Operations - JOIN


Example: Suppose we want to retrieve the name of the manager of each department. To combine each DEPARTMENT tuple with the EMPLOYEE tuple whose SSN value matches the MGRSSN value in the department tuple. DEPT_MGR DEPARTMENT MGRSSN=SSN EMPLOYEE MGRSSN=SSN is the join condition Combines each department record with the employee who manages the department. The join condition can also be specified as DEPARTMENT.MGRSSN= EMPLOYEE.SSN

Binary Relational Operations - JOIN


EQUIJOIN Operation The most common use of join involves join conditions with equality (=) comparisons only In the result of an EQUIJOIN we always have one or more pairs of attributes (whose names need not be identical) that have identical values in every tuple. The JOIN seen in the previous example was an EQUIJOIN

NATURAL JOIN Operation NATURAL JOIN ( denoted by *) requires that the two join attributes, or each pair of corresponding join attributes, have the same name in both relations. If this is not the case, a renaming operation is applied first.

Binary Relational Operations NATURAL JOIN


Example: To apply a natural join on the DNUMBER attributes of DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write DEPT_LOCS DEPARTMENT * DEPT_LOCATIONS Only attribute with the same name is DNUMBER An implicit join condition is created based on this attribute: DEPARTMENT.DNUMBER=DEPT_LOCATIONS.DNUMBER Another Example: Q R(A,B,C,D) * S(C,D,E) The implicit join condition includes each pair of attributes with the same name as follows: R.C = S.C AND R.D = S.D Result keeps only one attribute of each such pair: Q(A,B,C,D,E)

Binary Relational Operations - DIVISION


Let relation R is defined over the attribute set Z and relation S is defined over the attribute set X such that X Z. Let Y = Z X , that is, Y is the set of attributes of R that are not attributes of S. The result of DIVISION operation (R(Z) S(X)) is a relation T(Y). For a tuple t to appear in the result T of the DIVISION, the values in t must appear in R in combination with every tuple in S. The general form of DIVISION operation is given as: DIVISION tablename2,tablename1 Into RESULT.

Examples of the DIVISION Operation


(a) SSNS SSN_PNOS SMITH_PNO (b) T R S

Additional Relational Operations


OUTER JOIN Operations In NATURAL JOINs (or INNER JOINs), only matching tuples are included in the join result. We may want a tuple from one of the relations to appear in the result even when there is no matching value in the other relation. This can be accomplished by the OUTER JOIN operation. OUTER JOINs can be used to keep required tuples. The left outer join operation (R S) keeps every tuple in the first or left relation R; if no matching tuple is found in S, then the attributes of S in the join result are filled with NULL values. A similar operation, right outer join (R S), keeps every tuple in the second or right relation S; if no matching tuple is found in R, then the attributes of R in the join result are filled with NULL values. A third operation, full outer join, denoted by (R S), keeps all tuples in both the left and the right relations when no matching tuples are found, padding them with NULL values as needed.

Outer Join Example

Additional Relational Operations


Aggregate Functions and Grouping. Used to specify mathematical aggregate functions on collections of values from the database. Examples of such functions include retrieving the average or total salary of all employees. Common functions applied to collections of numeric values include SUM, AVERAGE, COUNT, MAXIMUM, and MINIMUM. Example: Let's assume that we have a table named Account with three columns, namely Account_Number, Branch_Name and Balance. To find the highest balance of all accounts regardless of branch, we could simply write FMax(Balance) (Account). To find the maximum balance of each branch. This is accomplished by Branch_NameFMax(Balance)(Account).

Relational Algebra Operations

Complete Set Of Relational Operations


The set of operations SELECT, PROJECT, UNION, MINUS and CARTESIAN PRODUCT is called complete set of relational operations because any other relational algebra expression can be expressed by a combination of these five operations. For example : R S = (R S) (( R S ) ( S R )) R <join condition> S <join condition> (R S) Some operations are included for convenience rather than necessity. The DIVISION operation can be expressed as a sequence of , , and operations as given below: R(z) S(x) y = z - x T1 y (R) T2 y ((S T1) R)

Result T1

- T2

Relational Calculus
Relational calculus consists of two calculi, the tuple relational calculus and domain relational calculus, that are part of the relational model for databases and provide a declarative way to specify database queries. This in contrast to the relational algebra which is also part of the relational data model but provides a more procedural way for specifying queries. Thus, in a relational calculus, there is no description of how to evaluate a query; a relational calculus query specifies what is to be retrieved rather than how to retrieve it. The relational algebra and the relational calculus are essentially logically equivalent: for any algebraic expression, there is an equivalent expression in the calculus, and vice versa.

Relational Calculus
The relational algebra might suggest these steps to retrieve the phone numbers and names of book stores that supply Some Sample Book: Join book stores and titles over the BookstoreID. Restrict the result of that join to tuples for the book Some Sample Book. Project the result of that restriction over StoreName and StorePhone. The relational calculus would formulate a descriptive, declarative way: Get StoreName and StorePhone for supplies such that there exists a title BK with the same BookstoreID value and with a BookTitle value of Some Sample Book.

Tuple Relational Calculus


The tuple relational calculus was originally proposed by Dr. Codd in 1972. In the tuple relational calculus tuples are found for which a predicate is true. The calculus is based on the use of tuple variables. To specify the range of a tuple variable R as the EMPLOYEE relation, it can be written as: EMPLOYEE(R) To express the query Find the set of all tuples R such that F(R) is true, we write : { R | F(R) } Example : To find first and last names of all employees whose salary is above 50,000. : { t.Fname, t.Lname | EMPLOYEE(t) AND t.salary > 50000} The condition EMPLOYEE(t) specifies that the range relation of tuple variable t is EMPLOYEE.

Tuple Relational Calculus


Existential and Universal Quantifiers Two special symbols called quantifiers can appear in formulas: Existential quantifier () and Universal quantifier () Informally, a tuple variable t is bound if it is quantified, meaning that it appears in an (t) or (t) clause; otherwise it is free. [There exists] (t)(F) is true if the formula F evaluates to true for some (at least one) tuple assigned to free occurrences of t in F; otherwise (t)(F) is false. [For all] (t)(F) is true if the formula F evaluates to true for every tuple assigned to free occurrences of t in F; otherwise (t)(F) is false. For example: There exists some natural number such that there squares are less than 100. For all natural numbers, their square roots are positive.

Tuple Relational Calculus


Example query using Existential Quantifiers Retrieve the name and address of all employees who work for the Research department. { t.Fname, t.Lname, t.Address | EMPLOYEE(t) AND (d) (DEPARTMENT(d) AND d.Dname = Research AND d.Dnumber=t.Dno)} Retrieve the details of cities where there is a branch office but no properties for rent. {b.City | Branch(b) AND (NOT(p)(PROPERTY-FOR-RENT(p) AND b.City = p.City)}

Tuple Relational Calculus


Example query using Universal Quantifiers Retrieve the names of employees who work on all the projects controlled by department 5. {e.Fname, e.Lname | EMPLOYEE(e) AND (( x)(NOT (PROJECT(x)) OR NOT (x.Dnum = 5) OR (( w) (WORKS_ON(w) AND w.Essn = e.Ssn AND x.Pumber = w.Pno))))} We can break up query into its basic components: {e.Fname, e.Lname | EMPLOYEE(e) AND F} F= (( x)(NOT (PROJECT(x)) OR F1) F1 = NOT (x.Dnum = 5) OR F2 F2 = (( w) (WORKS_ON(w) AND w.Essn = e.Ssn AND x.Pumber = w.Pno))

Domain Relational Calculus


Domain relational calculus was proposed by Lacroix and Pirrote in 1977. In domain relational calculus, the variables take their values from domains of attributes rather than tuples of relation. An expression for the domain relational calculus has the following general form. {d1, d2, .., dn | F(d1, d2, .., dm)} m>=n Where d1, d2, , dn and d1, d2, , dm represents domain variables and F represents a formula or condition. List the details of the employee working on a SAP project. {FN, LN | ( EN) EMPLOYEE(EN,FN,LN,SEX,BDATE,PROJ) AND PROJ = SAP)} List the details of employees working on a SAP project and drawing salary more than 30000. {FN, LN | ( EN) (EMPLOYEE(EN,FN,LN,SEX,BDATE,PROJ) AND PROJ = SAP AND SAL>30000)}

You might also like