You are on page 1of 52

Introduction To SQL

UnitIntroduction
3 To TSQL
Unit 3

Modern Business
Technology
Developed by
Michael Hotek
Unit 3

Goals
• Nulls
• Group by
• Order by
• Distinct
• Aggregates
• Aggregates with grouping
• Having
• Compute
• Unions
Null

• There are times when data is missing


or incomplete

• To handle this missing data, most


DBMSs use the concept of a null

• A null does not mean zero

• A null also does not mean a blank

• A null indicates that a value is


missing, unavailable, incomplete,
and inapplicable
Null

• Nulls represent an unknown quantity


or value

• You can't guarantee that a null does


equal some other value

• You also can't guarantee that a null


doesn't equal another value

• A null also might or might not equal


another value
Null

• For example take the authors table

• If we were to leave out the state data


for an author, this could bring up a
few questions

• Is the author from CA?

• Is the author not from CA?

• Is the author from some other state?

• Any or none of these questions could


be true
Null

• Any question about a null could


provide three answers: yes, no, or
maybe

• This could mean that using nulls


gives us a very serious problem,
since rows are selected based on a
criteria being true

• Fortunately the DBMS manufacturers


have given us some relief
Rules for Nulls

• A null does not designate an


unknown value

• A null does not equal another distinct


value

• A null does not equal another null

• WAIT A MINUTE!!!
Nulls cont.

• I can obviously test for a null and I can place


a null into a column

• Since I am placing the same "value" (a null)


into a column, how can a null not equal a
null

• A null represents the nonexistence of data

• Something that doesn't exist can't be


compared with something else that doesn't
exist.

• If it could then, this would imply that the


values being compared actually do exist.
This violates the definition of a null
Nulls (theory aside)

• All of this appears to be rather deep


and theoretical. In fact entire books
have been written about nulls.

• This class is based on the practical


application of SQL theory

• To that end the only things you need


to remember are the following:
– You can select rows that have a null
value
– A null does not equal a null
Nulls Applied

• Suppose we want to get the titles


that do not have an assigned royalty

• Based on our previous experience


we would probably do the following:
– select * from titles where royalty = null

• Paradoxically, this would work in


most DBMSs

• This is because most DBMS


manufacturers recognize the
problems with null and seek to
protect you from yourself. The
DBMS will convert this into it's proper
form and return what you asked for
Nulls Applied

• The proper way is to be explicit in


what you are asking.

• We want to know where the values


are null

select title, royalty from titles where royalty is


null
title                                                       
       royalty     
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
­ ­­­­­­­­­­­ 
The Psychology of Computer Cooking        (null)      
Net Etiquette                                               
 (null)      

(2 row(s) affected)
The Basics recap

• This completes all of the basics of


selecting data

• To quickly recap

• The select clause specifies what


columns we want to see

• The from clause tells what table we


want to see data from

• The where clause restricts the data


we will see
Order by

• The order by clause is used to


specify a sorting order of the result
set

• The sorting can be performed by


column name or by column number

select au_fname,au_lname from authors order


by au_lname,au_fname

or

select au_fname,au_lname from authors order


by 2,1
Order by

• Depending upon the DBMS, the


column you are ordering by does not
need to be specified in the select
clause

select au_fname, au_lname from authors order


by state

• While this does work on some


DBMSs, it is generally not advisable

• The default sort order is ascending


(a-z), but you can specify a
descending order by using the
keyword desc

• …order by au_lname desc,


au_fname
Sort Order

• If order by sorts the data, how do I


know what that order it is sorted in?

• The sort order is determined by a


character set which is defined for a
database

• In Sybase and MS SQL Server, this


character map can be retrieved by
executing sp_helpsort

exec sp_helpsort
Order by

• An order by is not limited to actual


data columns

• We can order by a calculation if we


wish

select au_fname + ' ' + au_lname name from


authors order by name
name                                                          
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ 
Abraham Bennet                                                
Akiko Yokomoto                                                
Albert Ringer                                                 
Ann Dull                                                      
...
Meander Smith                                                 
Michael O'Leary                                               
Michel DeFrance                                               
Morningstar Greene                                            
Patti Smythe                                                  
Reginald Blotchet­Halls                                       
Sheryl Hunter                                                 
Stearns MacFeather                                            
Sylvia Panteley                                               

(27 row(s) affected)
Order by / Nulls

• An order by is based upon a sort


order specified by a character set

• Since nulls aren't characters, where


do these fit in?

• Depending on the DBMS, you will


find the nulls at either the beginning
or the end of the result set.

• Where they are depends on the way


the DBMS manufacturer has
specified
Distinct

• As you have seen from some of the


queries we have run, you can get
what appear to be duplicate rows in
the result set

• From the scope of the result set, they


are duplicates

• From the scope of the database they


are not

• This is because the select


statements we have performed up to
this point returned the row of data for
every row in a table that matched a
specific criteria
Distinct

• Sometimes we do not want to see


these duplicate rows

• We can eliminate them by use of the


distinct keyword

• The distinct is placed immediately


after the select

• There can also be only one distinct


per SQL statement

• The distinct applies to all columns in


the select list
Distinct

select au_id from titleauthor


au_id       
­­­­­­­­­­­ 
172­32­1176 
213­46­8915 
213­46­8915 
238­95­7766 
267­41­2394 
267­41­2394 
...
899­46­2035 
899­46­2035 
998­72­3567 
998­72­3567 

(25 row(s) affected)
select distinct au_id from titleauthor
au_id       
­­­­­­­­­­­ 
172­32­1176 
213­46­8915 
238­95­7766 
267­41­2394 
...
899­46­2035 
998­72­3567 

(19 row(s) affected)
Aggregates

• There are times when we want to


perform calculations on all of the
values in a column or table

• We accomplish this through the use


of aggregates

• The three we will explore are count,


sum, and average
Count(*)

• Count will return exactly what it's


name implies

• It returns a count of the number of


rows in a table that match a certain
criteria

select count(*) from authors will return the


number of rows in the authors table
­­­­­­­­­­­ 
27          

(1 row(s) affected)

select count(*) from authors where state = 'CA'


will return the number of authors living in CA
­­­­­­­­­­­ 
15          

(1 row(s) affected)
Sum

• The sum is used to add up all of the


values in a column

select sum(advance) from titles will return the


total amount advanced to all authors
­­­­­­­­­­­­­­­­­­­­­­­­­­ 
95,400.00                  

(1 row(s) affected)
Avg

• Avg will return the average value in a


column

select avg(price) from titles will return the


average price of all books
­­­­­­­­­­­­­­­­­­­­­­­­­­ 
14.77                      

(1 row(s) affected)

select avg(price) from titles where price > 10


will return the average price of the books
over $10
­­­­­­­­­­­­­­­­­­­­­­­­­­ 
17.94                      

(1 row(s) affected)
Group by

• Data in a table is essentially stored


randomly

• We can impose one type of order on


the result set with an order by

• We can impose another type of order


on a result set by using a group by
clause
Group by

• The group by will order the data into


groups that you specified and then
return the set of rows that determine
the groups

• Duplicates are removed from this


result set

• In this way, a group by performs a


similar operation to distinct

• The distinct does not sort the data


though

• You still need to specify an order by


clause to perform sorting
Group by

select type from titles group by type


type         
­­­­­­­­­­­­ 
(null)       
UNDECIDED    
popular_comp 
business     
mod_cook     
trad_cook    
psychology   

(7 row(s) affected)

select type from titles group by type order by 1


type         
­­­­­­­­­­­­ 
(null)       
UNDECIDED    
business     
mod_cook     
popular_comp 
psychology   
trad_cook    

(7 row(s) affected)
Group by and Nulls

• Nulls are treated specially by a group


by clause

• When a group by is being evaluated,


all nulls are put in the same group

select type from titles group by type


type         
­­­­­­­­­­­­ 
(null)       
UNDECIDED    
business     
mod_cook     
popular_comp 
psychology   
trad_cook    

(7 row(s) affected)
Group by and where

• You can use a where clause to limit


the set of data that the group by will
consider

select type from titles where advance > 5000


group by type
type         
­­­­­­­­­­­­ 
business     
mod_cook     
popular_comp 
psychology   
trad_cook    

(5 row(s) affected)
Group by

• The true power of a group by comes


from using it in conjunction with an
aggregate

• Suppose we wanted a count of each


type of book

• At first thought you might be tempted


to do this:

select type,count(*) from titles


Msg 8118, Level 16, State 1
Column 'titles.type' is invalid in the select list because it is not 
contained in an aggregate function and there is no GROUP BY 
clause.
Group by

• This doesn’t quite get what we need

select type,count(*) from titles group by type


type                     
­­­­­­­­­­­­ ­­­­­­­­­­­ 
(null)       2           
UNDECIDED    1           
business     2           
mod_cook     2           
popular_comp 3           
psychology   5           
trad_cook    3           

(7 row(s) affected)
Group by

• One thing to remember is that if you


use a group by with an aggregate,
you must specify all nonaggregate
columns in the group by clause

select city,state,count(*) from authors group by


state will return a syntax error
Msg 8120, Level 16, State 1
Column 'authors.city' is invalid in the select list because it is not 
contained in either an aggregate function or the GROUP BY clause.

select city,state,count(*) from authors group by


state,city will return a result set
city                 state             
­­­­­­­­­­­­­­­­­­­­ ­­­­­ ­­­­­­­­­­­ 
(null)               MA    4           
Ann Arbor            MI    1           
Berkeley             CA    2           
Corvallis            OR    1           
Covelo               CA    1           
Gary                 IN    1           
...
(17 row(s) affected)
Group by

• You can not specify an aggregate in


the group by clause

select count(*) from authors group by count(*)


will return a syntax error
Msg 144, Level 15, State 1
Cannot use an aggregate or a subquery in an expression used for the 
by­list of a GROUP BY clause.
Having

• The having clause works just like a


where clause

• There is a fundamental difference

• The where clause defines the set of


data the grouping is done on

• The having defines which groups are


going to be returned to the user
Having

• Having clause generally contain


aggregates as part of the selection
criteria

select pub_id,sum(advance) from titles group


by pub_id having sum(advance) > 10000
pub_id                            
­­­­­­ ­­­­­­­­­­­­­­­­­­­­­­­­­­ 
0736   24,400.00                  
0877   41,000.00                  
1389   30,000.00                  

(3 row(s) affected)

• This will return only the set of


pub_ids that had an advance of more
then $10000.
Having/Where

select type,count(advance) from titles where


advance > 10000 group by type,advance

select type,count(advance) from titles group by


type,advance having advance > 10000
Having/Where

• In both queries we want to know the


types of those books with an
advance > 10000, so why the
different results

• This is due to the way the where and


having are applied

• What happens is the data is selected


based on the result set
• It is then passed to the group by for
grouping
• Finally it goes to the having which
returns the data requested.
Having/Where

• In the first query, only those rows


that had an advance of > $10000

• The grouping is then applied to these


rows

• This was only 1 book for each of two


groups (the where criteria)
Having/Where

• The having processes the


aggregates and grouping first instead
of the selection like where does

• The having clause says give me the


groups that have one or more books
with an advance of > 10000
Where/Having

• The concepts of where and having


clauses can get confusing very
quickly

• The best way to get comfortable with


them is to perform a few and observe
the results

• Then draw out each of the steps on


paper until you can duplicate the
result set

• The book "The Practical SQL


Handbook" has a good explanation
on pages 180 - 185
Compute

• Now that everything is about as clear


as mud, we are going to introduce
another clause that can be employed
(compute)

• In a nutshell, a compute is used to


calculate grand summaries

select title_id,type,price from titles where type


like '%cook%' compute avg(price)
title_id type         price                      
­­­­­­­­ ­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­­­­­­­­­ 
MC2222   mod_cook     19.99                      
MC3021   mod_cook     2.99                       
TC3218   trad_cook      20.95                      
TC4203   trad_cook      11.95                      
TC7777   trad_cook      14.99                      

                      avg
                      ==========================
                      14.17                      

(6 row(s) affected)
Compute by

• A compute by is used to
subsummaries
• This construct must be used with an
order by
select title_id, type, price from titles where type
like '%cook%' order by type compute
avg(price) by type
title_id type         price                      
­­­­­­­­ ­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­­­­­­­­­ 
MC2222   mod_cook     19.99                      
MC3021   mod_cook     2.99                       

                      avg
                      ==========================
                      11.49                      

title_id type         price                      
­­­­­­­­ ­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­­­­­­­­­ 
TC3218   trad_cook    20.95                      
TC4203   trad_cook    11.95                      
TC7777   trad_cook    14.99                      

                      avg
                      ==========================
                      15.96                      

(7 row(s) affected)
Compute/Compute by

• These can be used in the same


query
select title_id,type,price from titles where type in
('business','mod_cook') order by type
compute sum(price) by type compute
sum(price)
title_id type         price                      
­­­­­­­­ ­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­­­­­­­­­ 
BU2075   business     2.99                       
BU7832   business     19.99                      

                      sum
                      ==========================
                      22.98                      

title_id type         price                      
­­­­­­­­ ­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­­­­­­­­­ 
MC2222   mod_cook     19.99                      
MC3021   mod_cook     2.99                       

                      sum
                      ==========================
                      22.98                      
                      sum
                      ==========================
                      45.96                      

(7 row(s) affected)
Compute/Compute by

Restrictions
• With a compute/computed by, you
can only use columns in the select
list

select title_id,type from titles…compute


sum(price) would return a syntax error

• You must order by the compute by


column

• You can use any aggregate except


count(*)
Compute/Compute by

Restrictions
• Columns listed after the compute by
must be in the identical order to or a
subset of those listed after the order
by

• Expressions must be in the same left


- right order

• Compute by must start with the same


expressions as listed after order by
and not skip any expressions
Compute/Compute by

Legal
• order by a,b,c
• compute by a,b,c
• compute by a,b
• compute avg(price) by a

Illegal
• order by a,b,c
• compute by b,a,c
• compute by c,a
• compute avg(price) by b
Unions

• There are times when we want to


return two or more sets of data within
a single select statement

• Examples of this are combining data


from two different tables when they
have mutually exclusive criteria

• To do this we use a union


Unions

select * from authors where state = 'CA' union select * from authors where state
= 'MA'
au_lname                                 state 
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ ­­­­­ 
Bennet                                  CA    
Carson                                  CA    
Dull                                       CA    
Green                                    CA    
Gringlesby                            CA    
Hunter                                   CA    
Karsen                                   CA    
Locksley                                CA    
MacFeather                           CA    
McBadden                             CA    
O'Leary                                  CA    
Straight                                  CA    
Stringer                                  CA    
White                                     CA    
Yokomoto                              CA    
Burns                                    MA    
Johnson                                  MA    
Smithe                                   MA    
Smythe                                  MA    

(19 row(s) affected)
Unions

• The only restrictions on unions are


that the same number of columns
must be in each separate result set
and the datatypes must match

• You can not union a select statement


that returns 2 columns with a select
that returns 3 columns

• You also can't union a result set


where the first column of one select
is character data and the first column
of another select is numeric data
Unit 3 Review

• Nulls are used to represent the nonexistence


of data
• A null doesn't equal another null
• An order by can be used to sort the result
set
• The sort order is determined by the
database's character set
• To remove duplicate rows from a result set
use distinct
• You can perform calculations using
aggregates count(*), sum,avg are the most
common
• You can group data together by using a
group by
• Group by can be combined with aggregates
to perform sophisticated calculations
• A having clause performs a restriction on a
group by
• Having and where behave differently due to
the order they process the row selection
• Compute can be used to calculate grand
summaries
Unit 3 Review cont.

• Compute by can be used to calculate sub


summaries
• Unions allow us to combine multiple results
sets and return them to the user in a group
Unit 3 Exercises

• Time allotted for exercises is 1 hour

You might also like