Professional Documents
Culture Documents
Show Me Question #1
Show Me Question #1
A relational database is a set of data tables that are somehow linked to or related to
each other. It is used to store different types of information that can be pulled
together to answer speci c analytical questions. It's a useful way to minimize the
amount of data stored on a server without losing any critical information.
SQL , then, is simply the language used to communicate with this relational database.
Databases don't yet understand human languages like English it's simply too
syntactically complex so we use a standardized language to communicate with
them that we know will be understood.
There are multiple types of JOIN clauses, and they all serve slightly different
functions:
INNERJOIN returns a list of rows for which there is a match in both tables
speci ed. It's the default join type, so if you just type JOIN without specifying
any other conditions, an INNERJOIN will be used.
LEFTJOIN will return all results from the left table in your statement, matched
against rows in the right table when possible. If a row in the left table does not
contain a corresponding match in the right table, it will still be listed with
NULL values in columns for the right table.
RIGHTJOIN will return all results from the right table in your statement,
matched against rows in the left table when possible. If a row in the right table
does not contain a corresponding match in the left table, it will still be listed
with NULL values in columns for the left table.
FULLJOIN will return all results from both the left and the right tables in your
statement. If there are instances in which rows from the left table do not match
the right table or vice versa, all data will still be pulled in but SQL will output
NULL values in all columns that are not matched.
CROSSJOIN returns the Cartesian product of two tables in other words, each
individual row of the left table matched with each individual row of the right
table.
SELECT*FROMorders;
1000rowsinset(0.05sec)
And 23 of those orders are from the user with customer_id = 45:
SELECT*FROMordersWHEREcustomer_id=45;
23rowsinset(0.10sec)
Yet, when we SELECT the number of orders that are notfrom customer_id = 45, we
only get 973 results:
SELECT*FROMordersWHEREcustomer_id<>45;
973rowsinset(0.11sec)
973 + 23 = 996. But shouldn't the number of orders with customer_id equal to 45
plus the number of orders with customer_id not equal to 45 equal 1000? Why is this
query not returning the expected results?
The answer: this data set most likely contains order values with a NULL customer_id .
When using the SELECT clause with conditions, rows with the NULL value will not
match against either the = or the <> operator.
Our second query above could be modi ed as follows to produce the expected
results:
SELECT*FROMordersWHERE(customer_id<>45ORcustomer_idISNULL);
977rowsinset(0.11sec)
4. Why does one of these queries work while the other
does not?
Consider the following query, which returns the expected results:
SELECTCASEWHEN(3IN(1,2,3,NULL))THEN'Threeishere!'ELSE"Threeisn'there!"
/*
++
|result|
++
|Threeishere!|
++
1rowinset(0.00sec)
*/
SELECTCASEWHEN(3NOTIN(1,2,NULL))THEN"Threeisn'there!"ELSE'Threeishere!'
/*
++
|result|
++
|Threeishere!|
++
1rowinset(0.00sec)
*/
Three is not included in the second set so why does our query mistakenly deliver
the output, "Threeishere!" ?
The answer, once again, has to do with the way MYSQL handles NULL values. Let's
take a closer look. In our rst query, we ask whether the value 3 is included in the set
(1,2,3,NULL) . Our statement is functionally equivalent to the following:
SELECTCASEWHEN((3=1)OR(3=2)OR(3=3)OR(3=NULL))THEN'Threeishere!'
/*
++
|result|
++
|Threeishere!|
++
1rowinset(0.00sec)
*/
Since 3 is de nitely equal to 3, one of our OR conditions is met, and the statement
outputs, "Threeishere!" . Our second statement, on the other hand, asks whether
the value 3 is NOT included in the set (1,2,NULL) . This statement is functionally
equivalent to the following:
SELECTCASEWHEN((3<>1)AND(3<>2)AND(3<>NULL))THEN"Threeisn'there!"ELSE
/*
++
|result|
++
|Threeishere!|
++
1rowinset(0.00sec)
*/
In this case, the conditional check 3<>NULL fails, because in ANSI-standard SQL,
we need to use the ISNULL statement rather than the <> operator.
CREATETABLE`orders`(
`order_id`int(11)NOTNULLAUTO_INCREMENT,
`customer_id`int(11)NOTNULL,
`order_placed_date`dateNOTNULL,
PRIMARYKEY(`order_id`),
KEY`customer_id`(`customer_id`),
FOREIGNKEY(`customer_id`)REFERENCES`customers`(`customer_id`)
);
Can you construct a simple SELECT statement that uses an INNERJOIN to combine
all information from both the customers and orders tables?
SELECT*FROMordersINNERJOINcustomersonorders.customer_id=customers.customer_id
SELECTorder_id,customer_id,YEAR(order_placed_date)ASorder_yearFROMordersWHERE
Here's the answer: order_year is an alias , meaning that it's being used as another
name for a more complex reference: YEAR(order_placed_date) . It turns out that in
SQL, aliases can only be referenced in GROUPBY , ORDERBY , and HAVING clauses
they can't be used in WHERE clauses. Running the above code will produce the
following result:
ERROR1054(42S22):Unknowncolumn'order_year'in'whereclause'
To x this problem, we need to reiterate the de nition of the order_year alias in the
WHERE clause like so:
SELECTorder_id,customer_id,YEAR(order_placed_date)ASorder_yearFROMordersWHERE
498rowsinset(0.00sec)
CREATETABLE`products`(
`product_id`int(11)NOTNULLAUTO_INCREMENT,
`name`varchar(255)NOTNULL,
`price`decimal(19,4)NOTNULL,
PRIMARYKEY(`product_id`)
);
CREATETABLE`order_products`(
`order_product_id`int(11)NOTNULLAUTO_INCREMENT,
`order_id`int(11)NOTNULL,
`product_id`int(11)NOTNULL,
PRIMARYKEY(`order_product_id`),
KEY`order_id`(`order_id`),
KEY`product_id`(`product_id`),
FOREIGNKEY(`order_id`)REFERENCES`orders`(`order_id`),
FOREIGNKEY(`product_id`)REFERENCES`products`(`product_id`)
)
Can you write a query that nds the total order price (e.g., the sum of product.price
from each order) for all order_id s?
This question is a bit tough, as we'll have to use both the SUM function and the GROUP
BY clause to aggregate orders by order_id . Here's how we do it:
SELECTorder_id,SUM(price)AStotal_order_priceFROMorder_productsINNERJOINproducts
1000rowsinset(0.01sec)
Looking for more SQL prep? Don't forget to check out our resources
page (/sql/sql-resources/)!
Copyright 2016