You are on page 1of 22

MS Access Data Base and Structured Query Language (SQL) 1.

Motivation and Context

Though databases are a topic of general interest in computing and information technology, our specific purpose in introducing them here is to obtain enough familiarity with them in order to implement the pervasive Internet application architecture called the three-tier architecture. The three-tier architecture is a software architecture model that encompasses: 1. A client web browser interacting with 2. a web server capable of executing programs like JSP where 3. the server in turn mediates or interfaces with a background database such as MS Access Thus, the server acts as an intermediary between the client(s) and database(s). This is the essential paradigmatic software architecture for Internet ecommerce applications. We have already seen how clients and servers behave. We shall now turn our attention to describing the rudiments of databases using the MS Access Database for convenience. MS Access provides a convenient graphical interface for invoking its capabilities or functions, as well as allowing you to use the Structured Query Language (SQL) for querying the data base. SQL is a standard syntax for formulating queries to a database and is generally portable across different database platforms. Understanding SQL, in addition to the manual GUI interface that Access provides, is important because of its widespread use as well as because we will subsequently interface with the data base from a Java Server Page (JSP) program which precludes using the GUI interface and can only be assessed using SQL commands. The JSPs that execute on the server can respond to HTML-Form-driven requests from the client, using the client's Form data to interact with the database. This kind of program-driven interaction with the database is essential to any large scale web application because in such cases the number of interactions precludes their being handled manually. The objectives of this section are to understand how to: 1. 2. 3. 4. 5. 6. Create and populate a data base in Access. Define basic SQL syntax using Access and use built-in Arithmetic Functions. Formulate basic SQL queries. Formulate Queries involving multiple tables: table joins. Use the MS Access graphical tool for creating queries. Use additional miscellaneous capabilities and SQL queries such as establishing password security for a database, SQL data deletion and insertion, creating tables with SQL, the IN operator for sets, and nested queries.

2.

Create and Populate a Database Table

Select MS Access from the start menu. Select the blank data base option in the dialog window and navigate to where you want to save the database. [This option is on right-hand-side panel, not on the toolbar.] Save it as example00.mdb. Refer to web site link Access DB examples > example00.mdb for a copy of completed example. At this point the screen looks something like:

Access is a table-structured data base. A table has an attribute or property in each of its column. The rows of the table represent instances of entities with their attributes given. This is best illustrated by an example of a populated table. To create a Table in the data base, select the Tables option under the Objects heading in the dialog window. Then select and double-click the 'Create table in Design View' icon option. Define the attributes for the table in the grid-dialog window that appears, as well as the data types of each of the attributes. See the snapshot below. The data type options are selected from the Data Type column. In the example used, some of the attributes are naturally Text or Number, but the cost is naturally of type Currency.

Observe that this is not where you populate the database with data. This is where you define the attributes or properties of the database table and their data types. This is definitional information about the table, not the contents of the table. It is meta-data as opposed to data.

If you go to close the Table [by clicking the X in upper right of window], you will be given a choice of saving the Table. At this point, provide a table name ('Books') in the dialog window that opens. After this, MS Access will also ask whether you want to identify a search key. A search key is an attribute whose value is unique: it appears in only one row of the table. The purpose of a search key is twofold: it uniquely identifies records in a table; it also speeds up searching for records since when keys exist for a table, Access can internally organize the table data to expedite locating records/rows with given key values. Familiar examples of search keys are student ids or a social security number. If you say No to Access's request to define a search key, none will be associated with the table. If you define one of the attributes of a table to be a key, Access will prevent you from saving rows (records) which duplicate the key value of any existing record in the table. To declare an attribute a key, right-click the leftmost column just to the left of the intended name of the attribute and select the primary key option in the popdown menu that appears. Alternatively, you can let Access automatically create an ID key for a table. For example, if you say 'Yes' when you close the table property

definition the first time, Access provides an automatically generated 'ID' with values starting at 1 and automatically updated sequentially and so acts as a key. When the table definition is closed, you can re-open it by clicking on the table icon and right-clicking the title bar to select the design view. You can also switch back to the data by selecting DataView the same way. If you selected the automatically generated ID alternative, then the screen would look like the following:

To populate the table that is insert data in its rows/records - just enter data in the attribute fields. The default ID values will be generated automatically: Access will prevent you from entering data in the ID field since Access itself generates these automatic key values. To enter a new row of data, just enter data in the bottom, open or empty row. For example:

To review or revise the table attributes or properties, select Tables under the Objects column, highlight the table (Books) icon of interest, then click the Design icon on the toolbar of the window. Alternatively, right-click the opened data table on its blue title bar and select Design in the pop-down menu that appears. This re-opens the Properties window which lets you change the table properties and data types. 3. Defining SQL Queries in Access and Built-in Arithmetic Functions

SQL is a standardized way to query or ask a question about what's in a data base. The same kind of syntax is used in many different database systems like Access, Oracle, MySQL, etc. It is essential to understand SQL if you want to be able to embed queries in a programming language. We will do this later when we retrieve data from a database by embedding SQL queries in a JSP program. The combination of SQL with a programming language provides a powerful tool which allows one to flexibly process the information in a database. The following examples are more 'hardwired' than queries in a JSP. Here, the attribute values are explicitly specified, while in a program they would usually be variables. The SQL queries utilize the following core syntax: SELECT FROM the data or attributes wanted for the answer database tables needed to handle the query [together with temporary variable names for the tables]

WHERE

Boolean restrictions are specified which depend on the question, as well as interlinking-conditions - for the tables needed to answer the query based on their shared attributes

To define an SQL query in Access, click Queries under Objects in the dialog window, then click 'Create query in Design view'. For simplicity, close the 'show table' dialog window' that opens, but leave the 'Select Query' window open. Click the SQL tab on the uppermost, top left toolbar for Access (the outermost window you are working with) and select SQL View on the pop-down menu, which will clear the 'Select Query' window so it just contains 'Select;'. Then enter the desired SQL query in this text area. For example, you can begin with a query that retrieves all (the information on) the Books where the author is 'McHugh' as follows: Select From Where * Books as b b.author = 'McHugh'

In this example, the notation '*' means that all the attribute values of the rows that satisfy the Where condition are to be returned or displayed. In this example, the only table involved is Books, which we have given the shorthand name b (just like an algebraic variable.) In order to execute the query, click the red exclamation point ( ! ) that appears after you have entered the query on the topmost Access toolbar, to execute query. (This requires the Query dialog window to have had at least a <CR> entered in order for the ! to be accessible.)

The query returns the requested results in a table as shown below. To see the original query again, right click the data table on the title bar and select SQL View. If you close the Query-1 window, you will be asked at that point if you want to save it (for future reference). It is important to verify the correctness of the query results certainly at least when the query is first being tested. It is easy to incorrectly define a query that looks or appears to right because it gives the right answers. But the query may actually be wrongly formulated. For example, it may also give extraneous answers in addition to the correct ones, or it may omits some of the correct answers. Remember that the answer is only correct if it supplies the whole truth and nothing but the truth. Supplying some of the correct answers is not enough; adding in some incorrect answers as well won't do either.

To selectively retrieve attributes, we list the desired attributes in the Select clause: Select title, cost From Books as b Where b.author = 'McHugh' In this case a two-column table with title & cost attributes is returned, with rows that satisfy the Where clause condition. Caveat: Because the Access data font is small, it's easy to inadvertently introduce, for example, a leading blank before the data (such as in typing ' McHugh' by mistake). In such a case, the string ' McHugh' would not match the string 'McHugh' (since the actual data presumably does not have the leading blank), so an empty table would be returned. It is also easy to overlook having introduced a space before the dot in expressions like b.author (such as b .author). This will trigger a popup prompt for a parameter value which is actually irrelevant. Misspelling attribute names also triggers the same indirect manifestation of an error. To retrieve all authors other than 'McHugh' use the Not operator: Select From Where title, cost Books as b Not (b.author = 'McHugh')

which means the same as: author is not 'McHugh'. The Where clause can also be omitted as in the following query: Select title, author

From

Books as b

in which case all the title/author entries in the table are retrieved without qualification. The syntax for using mathematical formulas in is easy and natural. For example, if you want to retrieve the dollar value of the inventory per book where 'McHugh' is the author, the cost per book and the number of books should be multiplied as in: Select From Where title, b.cost * b.stock Books as b b.author = 'McHugh'

To determine the total value of the inventory over all such books, we use the built-in function Sum applied to the formula: Select From Where Sum ( b.cost * b.stock ) Books as b b.author = 'McHugh'

which returns the results in the snapshot shown below. Additional examples of arithmetic calculations are given in the baseball database below. The MS Access arithmetic capability can be embedded in JSP programs to facilitate accounting style calculations by relying on built-in services provided by Access. This is quite important since it means that you can let SQL queries embedded in your JSP program perform potentially complex calculations without having to write the algorithms yourself in Java. The SQL lets you obtain these effects using simple imperative descriptions of what you want to calculate with almost no effort on your part except the formulation of the SQL statement.

4.

More SQL Queries

A Where conditions can be a combination of Boolean expressions. For example, the following query retrieves entries with author 'McHugh' or title 'Physics'. Select From Where * Books as b b.author = 'McHugh' or b.title = 'Physics'

The Where clause can also specify a very useful approximate search. It uses a JavaScript style of pattern matching syntax. As an example, the following query retrieves titles that begin with the letter 'I' - followed by an arbitrary length sequence of other characters. Select From Where * Books as b b.title LIKE 'I*'

The '*' in the pattern denotes zero or more wildcard characters. On th other hand, the notation ? denotes any single character, that is, it matches any character. Note: When using this SQL notation in Java use % instead of * and _ instead of ? . This kind of approximate or pattern-based search capability is very important in many applications including ecommerce. It permits a Java Server Program to incorporate embedded SQL that can provide powerful search capabilities in Select queries. Consider, for example, an application for a site that sells books. An approximate search makes it trivial to retrieve books not just on the basis of their complete and exact titles, but also to retrieve titles on the basis of user-supplied search fragments. Furthermore, this capability is immediately available without any algorithmic development required on the part of the

JSP programmer. All that is required is the formulation of an appropriate SQL query which entails only a problem-solving-like formulation, not algorithmic programming. At least as important is the fact that the algorithms which are built in to Access that provide this ability are already thoroughly tested and verified in detail as part of that package. The algorithms do not have to be developed, tested, or verified at the JSP stage. Particularly in the case where a single attribute is being retrieved, the Where conditions in a query may select duplicate rows. For example, the query "Select author From Books as b Where b.author = 'McHugh'" returns every book with 'McHugh' as an author. It is simple to remove duplicates by including the key word Distinct in the select clause as follows: Select From Where 5. DISTINCT author Books as b b.author = 'McHugh'

Queries Involving Multiple Tables - Table Joins

So far, the examples we have considered involve a single table. However, generally a database contains multiple tables with the entries in one table related to entries in the other tables. To illustrate this, we have added another table to the sample database. We will denote the table as Publishers[name, address, phone, code] which is a standard notation for referring to a table and its attributes. Let us consider a query whose result depends on both tables. Suppose we want to retrieve the addresses of those publishers who publish McHugh's books. The obvious query that springs to mind is: Select From Where p.address Books AS b, Publisher as p b.author = 'McHugh'

However, the problem is that this query will retrieve the addresses of all three publishers listed in the Publishers table, even though only two of the Publishers actually satisfy the query. To understand what is happening, let us first modify the query to display all the attribute values retrieved: "Select * From Books AS b, Publisher as p Where b.author = 'McHugh'". This allows us to better understand what we selected. The retrieved table is shown below.

The retrieved table [Query1] shown here has been sorted on author. Notice that each of the rows for 'McHugh' (from the Book table) has had pasted onto it a copy of every row from the Publishers table. One of these rows introduces an extraneous publisher. Why is this occurring? The reason is that Access first automatically takes all possible combinations of the rows of the two tables - and only then extracts the rows with 'McHugh' as author. Viewed more abstractly, suppose we let A, B, C, D, and E denote the five rows from the Books table and let 1, 2, and 3 denote the three rows in the Publishers table. The so-called Cartesian product of the two tables is: { A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3, E1, E2, E3 } where Xi represents the concatenation of row X from Books and row j from Publishers. This intermediate result is created by Access before the Boolean Where condition (b.author = 'McHugh') is applied. Since 'McHugh' occurs in the Rows A and D (in the Books table), then the selected rows after the condition (b.author = 'McHugh') is applied will be: { A1, A2, A3, D1, D2, D3 } . Two of these entries are completely extraneous to our intended query. They only arise because every Publisher row has been pasted on every Book author row, regardless of whether there was any relation between the rows. The following discussion shows how to avoid this by joining the tables properly.

The question is how to join tables so as to eliminate extraneous combinations. The way to do this is to use additional conditions in the Where condition that prevent extraneous rows from being combined. In the present example, the publisherCode attribute in the Publishers table should have the same value as the code attribute in the Books table whenever we combine the information in the tables; otherwise it's like combining apples and oranges. We can force this by including linking conditions that let Access combine table rows only when the codes in the combined rows match. The correct query is: Select From Where p.address Books AS b, Publisher as p b.author = 'McHugh' and b.publisherCode = p.Code

This returns the Query1 table shown below. In the combined rows that appear in that table, the publishers for 'McHugh' have matching codes 11 and 444 (and addresses NY and NJ) as expected.

Further Table Join Examples The next few examples use a simple data base for baseball teams. The table teamTable has attributes teamName [text] and teamId [text]. The table managerTable has attributes manager [text], the town where the team lives, and his teamId. A third

table playerTable table has attributes name, teamId, and home, corresponding to the player's name, his teamId attribute (which is common to the three tables), and the player's hometown. The tables are populated as shown. teamTable
teamName teamId yankees dodgers giants 1 2 3

managerTable
manager n1 n2 n3 n4 teamI d 1 2 3 4 NY LA Pittsburgh SF town

playerTable
name teamId home p11 p12 p13 p21 p22 p23 p31 p32 p41 p42 1 1 1 2 2 3 3 3 4 4 NY NY LA NY LA LA LA LA SF SF

When following this, use the same values as here so that your queries have the same results as those we describe. The data is extensive enough so we can accurately test whether queries are really finding out what we think they are. Remember you have to carefully verify the results are fully correct, not just partially correct (lacking some results, or containing superfluous results). Most of the examples are recorded in Access DB examples > exp04.mdb. The following problems all require queries that use more than one table. We have expressed the questions as natural language (English) statements, so it requires some thinking to see how to formulate them as SQL queries. i. Find in what town the yankees play. The teamTable does not directly name the team's town only the teamId. However, the teamTable does associate a teamName with the teamId. Thus, the query entails combining two tables using a linking condition based on teamId: SELECT FROM WHERE ... to be filled in ... managerTable AS m, teamTable AS t t.teamId=m.teamId AND ... to be filled in ...

The rest of the Where condition has to identify that we are interested in the manager of the yankees, and the Select clause has to identify the expected results; namely, the manager's town m.town. The completed query is: SELECT FROM WHERE m.town managerTable AS m, teamTable AS t t.teamId=m.teamId AND m.teamName='yankees'

Here the teamName is quoted (in single quotes) because teamName is of type text. Had it been of type number, the quotes would have been omitted. Incidentally, if you enter an incorrect attribute for a table (say, m.name instead of m.town), the system will respond with a request for a parameter value. ii. Find all the Yankee players who live in the same town as their manager. The question does not have an immediately obvious SQL formulation. Furthermore, to answer the question, you have to link together all the tables it requires: the managerTable, teamTable, and playerTable. Omitting any of these leads to extraneous results. For example, if you formulate the query as: SELECT FROM WHERE p.name managerTable AS m, playerTable AS p, teamTable AS t t.teamId=m.teamId AND p.home=m.town AND teamName='yankees' you pick up an extraneous player (from the dodgers, team 2: p21) because the tables were not properly joined. Since there are three tables with shared attribute teamId, all three have to be linked, requiring two linking conditions as follows: SELECT FROM WHERE p.name managerTable AS m, playerTable AS p, teamTable AS t t.teamId = m.teamId AND m.teamId=p.teamId AND p.home=m.town AND teamName='yankees'

iii. Find the team [by name] which has a manager who lives in Pittsburgh. Various attempted queries are illustrated in Query 10-11 in exp04.mdb; query 10 illustrates a misspelling. iv. Find all the towns where a manager lives but some (nonzero number) of that manager's players do not live in that same town. 6. Not Operator, avg, min, max Functions

It's helpful to first understand the query by examining the data provided by merging the manager and player data: SELECT FROM WHERE * managerTable m, playerTable p m.teamId = p.teamId

This correctly combines manager information with the data for each player on the manager's team (as opposed to merely merging rows that should not be matched up - like managers with players from other teams). You can use these results to identify which players live in different towns from their managers. Given this, you can verify the correctness of the following where we have added the additional constraint that the manager and player live in different places using the Boolean NOT operator: SELECT FROM WHERE * managerTable m, playerTable p m.teamId = p.teamId AND NOT ( p.home = m.town )

The following example illustrates how built-in SQL mathematical functions can be combined. Suppose we have added new attributes to teamTable for average player salary (of type currency) and number of players per team (numPlayers, of type number). Verify that the following query calculates the average salary per players over all teams taken together: SELECT FROM sum ( numPlayers * salary ) / sum (numPlayers ) teamTable

The formula calculates the total outlay for salaries, divided by the total number of players on all the teams. You could also try to calculate the average salary using the built-in function avg: SELECT FROM avg ( salary ) teamTable

but this would not calculate the overall average correctly since the overall average is a weighted average. Other functions include: max, min. 7. Access Graphical Tool for Defining Queries

MS Access also provides a graphical tool for creating SQL queries which does not require explicit awareness of SQL syntax. For example, suppose you want to determine the addresses of all the publishers that author 'McHugh' publishes for and which have the digits '33' somewhere in their phone number. The SQL query would look like: Select p.address , p.phone From Books as b, Publishers as p Where b.author = 'McHugh' AND b.publisherCode = p.code AND p.phone LIKE '*33'

We could also define this query graphically as follows: 1. Click Create query in Design View. 2. Use the Show-Table window that appears and click the Add button to add the (selected) tables you want to Join together. The table-icons will appear in the upper section of the dialog window along with a scrolling view of their attribute names. 3. To link (Join) a pair of the icon-tables (like on the shared attributes publisherCode and Code), highlight one of the attributes [say, publisherCode] in one of the tables and drag it over the corresponding attribute [say, Code] in the other table, producing a two-way arrow between the linked attributes. Note: In this example, delete the default arrow between the automatically generated ID attributes since they don't make sense in this problem and impose irrelevant restrictions on the table entries. The tables can then be merged using run (denoted by the red exclamation point ! ). 4. To graphically define Where criteria click in one of the fields at the bottom of the dialog window and select a desired attribute from the resulting pull-down menu. To define a Boolean condition, enter the right-hand-side of the intended condition for the Where condition in the Criteria field as illustrated in the example, for example: Like '*33' for the phone number condition, which is equivalent to the condition: Where p.phone = '*33' . This can be repeated for multiple attributes with the resulting conditions then And' ed. If you enter the Criterion in the "or" field instead, it acts like an OR rather than an AND. 5. To make an attribute appear in the retrieved table, click the Show checkbox. 6. Finally, click run [ ! ] to see the resulting table of selected attributes. For a related simple example, do a join on the two tables with the shared code as the linked field. The resulting table contains only consistent data rather than incompatible extraneous data items. Another example of the Graphical query tool is for the following baseball example. The query is: Find all the Yankee players who live in the same town as their manager. Using the Show table tool under Create query in Design view, add the managerTable and the playerTable. Link them on the shared teamId attribute as before. The same thing can be done with other attributes as well that may not be shared but which you want to force to be equal. For example, also link the two tables on the town and home attribute as well, since these are expected to be the same: this way we do not have to explicitly set up that query criterion. Additionally, force the teamName to be the 'yankees' using a Field criteria. Then, select what you want to have shown in the result table, let's say the player's names. These are selected at the bottom of the window using Show checkbox. The snapshot shows the setup. If you want to require a negative condition, say that the team is not the Yankees, you can use not('yankees) or <> 'Yankees' as the criterion.

8. 8.1

Additional MS Access Functionality Database Password Protection

You can easily provide password protection for an Access database. First close any open copy of the database. Then, re-open MS Access, but not the sample data base itself. Then, select File > Open in the Access menu, navigate to the directory with the file to be secured, and select or highlight (but don't double-click or open) the desired database file. The open-menu-button on the bottom right of the window (see the figure below): will open a pull-down menu: select the 'open exclusive' option.

Then, select the Tools > security > Set database password option in the database dialog window toolbar (below) and set the password in the resulting dialog window. If you wish to remove password protection, again, first close Access. Then reopen it with the 'open-exclusive' option described previously. Then, select Tools > security > unset database password to remove the password. You will asked for the password again when you try to change the password, even though you already gave the password when you opened the database in the first place. Arguably, this second request is appropriate. For example, you may have walked away from your computer after initially opening it. You would not want someone to be able to interlope and change the password at that point without the screening test of the additional password request.

8.2

Delete or Insert Rows in Tables and Create new Tables

The following commands are more likely to be used in the context of a JSP program. For example, the Insert command would probably be used with variables whose value[s] were obtained from an HTML Form. So far we have obtained the same effects using the Access interface for making tables and entering values directly into tables, but in programming applications the GUI interface of Access is unavailable. Furthermore the GUI interface primarily uses fixed rather than variable expressions. a. Insert a row in a table: Insert Into Publishers Values ( '6', 'Auerbach', 'NY' , '212-777-7777', '99') This inserts a row of data into the Publishers table. If the SQL occurs in Java statement the single quotes ( ' ) have to be included. The number of values provided in the list has match the number of attributes in the table. In this example, Publishers also has an automatically generated key ID. The current maximum value of ID is 5, so the next ID should be '6', though the keys do not have to be in sequential order, just nonredundant. If a copy of the table is already open, the insert will not be immediately apparent. You have to close the table and reopen it to see the inserted row. If you try to add a row where the ID key duplicates an existing key value, the request will be blocked. b. Deleting rows from a table:

The syntax for delete is as illustrated: Delete From Publishers Where ID = 12 ; This removes the row with ID=12 from the indicated table. The attribute ID is a number so it is not enclosed in quotes. If a statement like: Delete From Publishers Where name = 'Thomson' were used, where the attribute is not a unique key, then all rows satisfying name = 'Thomson' are deleted. c. Creating a new table: Create table table1 ( ssn integer , name char(22) )

This creates a new table named table1 with the indicated attributes ssn and name. The attributes must be given data types. The type integer for ssn makes ssn of type number. The type char(22) for name makes name of type text, with the field size for the text attribute in this case being 22 characters. You can verify this under the Table Design view of table1's properties. Select the name attribute or the text Data Type for the name attribute: the Field Properties at the bottom of the window displays the name field characteristics. The same applies to the number property. 8.3 Set Operator IN and Nested Queries

The Boolean IN operator acts like a set membership test or operator. Thus, property IN ( 'x', 'y', 'z' ) is true if the property or attribute's value is 'x' or 'y' or 'z'. The IN can also be thought of as replacing a number of OR conditions. The following query illustrates the syntax. It returns all titles where the book author is 'McHugh' or 'Archimedes'. Select From Where b.title , p.name, b.author Publishers as p, Books as b b.author IN ( 'McHugh' , 'Archimedes' ) and p.code = b.publisherCode

The second part of the Where condition links the two tables so the Join is done correctly. The IN condition is equivalent to the OR statement: b.author = 'McHugh' OR b.author = 'Archimedes'

The IN operator works nicely with what are called nested queries. One merely replaces the list of values inside the IN operator's parentheses with an SQL query which returns a single-attribute-table of values. For example: Select From Where b.author, b.title Publishers as p, Books as b p.code IN ( Select p.code From Publishers p Where p.address IN ( 'NY', 'NJ' ) ) AND p.code = b.publisherCode

Notice that the structure of the IN clause is: property-x IN ( Select property-x From ...) with the same property occurring in both places. Thus this Where condition is equivalent to restricting the codes to those of publishers located in NY or NJ. The outer Select then uses these codes to identify authors who work with those publishers. Construction Notes 1. find teams of players who live in ny but their managers don't - dodgers 2. select distinct teamName from teamTable as t , managerTable as m, playerTable as p where not ( m.town = 'ny' ) and p.teamID=m.teamId and p.teamId=t.teamId and p.name IN ( select name from playerTable as p where p.home = 'ny' ) 3.better nested example 4. prompts & parameter

You might also like