You are on page 1of 9

Top 7 SQL interview

questions for tech


professionals
Doing an SQL interview as part of your job application?
Get some advance practice with our free sample
questions.

Show Me Question #1

SQL, a programming language used to communicate with databases, is


one of the most useful skills to know for recruiting in the tech industry.
Pronounced "ess-cue-ell" and short for Structured Query Language, this
incredible tool is a must-have for analyzing large data sets. In particular, it
shines when applied to relational databases unique tables of data that are
all related to each other in some way.

Because SQL is so ubiquitous in the tech world, many companies


conduct SQL interviews before extending job offers. This helps ensure
that job applicants particularly for roles in project management, analytics,
business intelligence, and software engineering are comfortable using
SQL on the job.

If youve got an upcoming SQL interview, youre probably wondering


what sorts of questions you might get. Recruiters are generally vague on
details, and it can be scary to walk into a conversation like this one blind.
So, weve provided our list of the 7 most common SQL interview
questions so that you can get some practice in before you exam. With a
little bit of advance preparation, youll feel prepared and con dent on
interview day.

Show Me Question #1

1. What is a relational database, and what is SQL? Please


provide examples as part of your explanation.
Even though you probably know what relational databases are and what SQL itself
is it can be tough to come up with coherent, simple explanations during a live
interview. To that end, be sure that you've prepared in advance to answer this simple
question. Here are some tips:

A relational database is a set of data tables that are somehow linked to or related to
each other. It is used to store different types of information that can be pulled
together to answer speci c analytical questions. It's a useful way to minimize the
amount of data stored on a server without losing any critical information.

That's a bit of a vague de nition, so let's take a look at a relational database in


practice. A simple version of a relational database for an online retailer might contain
two separate data tables:

Customers. A list of customer information, including customer names, contact


information, and shipping preferences. Each record in this database contains a
unique customer_id eld by which the customer can be identi ed.
Orders. A list of orders purchased at the retailer's website. Each order listing
also contains a customer_id eld, which is used to link that order's details with
the speci c customer who placed the order.
Of course, we wouldn't need a multi-table database if we simply included customer
information on the Orders table. But that wouldn't be particularly ef cient: if a single
customer placed multiple orders, his or her name, contact information, and shipping
preferences would be listed on multiple lines of the Orders table leading to
unnecessary duplication and an unmanageably large database. Instead, we create a
relational database to save space and show how different pieces of data are linked
together.

SQL , then, is simply the language used to communicate with this relational database.
Databases don't yet understand human languages like English it's simply too
syntactically complex so we use a standardized language to communicate with
them that we know will be understood.

2. What are the different types of SQL JOIN clauses, and


how are they used?
In SQL, a JOIN clause is used to return a table that merges the contents of two or
more other tables together. For example, if we had two tables one containing
information on Customers and another containing information on the Orders various
customers have placed we could use a JOIN clause to bring them together to
create a new table: a complete list of orders by customer, with all necessary
information to make shipments.

There are multiple types of JOIN clauses, and they all serve slightly different
functions:

INNERJOIN returns a list of rows for which there is a match in both tables
speci ed. It's the default join type, so if you just type JOIN without specifying
any other conditions, an INNERJOIN will be used.
LEFTJOIN will return all results from the left table in your statement, matched
against rows in the right table when possible. If a row in the left table does not
contain a corresponding match in the right table, it will still be listed with
NULL values in columns for the right table.

RIGHTJOIN will return all results from the right table in your statement,
matched against rows in the left table when possible. If a row in the right table
does not contain a corresponding match in the left table, it will still be listed
with NULL values in columns for the left table.
FULLJOIN will return all results from both the left and the right tables in your
statement. If there are instances in which rows from the left table do not match
the right table or vice versa, all data will still be pulled in but SQL will output
NULL values in all columns that are not matched.
CROSSJOIN returns the Cartesian product of two tables in other words, each
individual row of the left table matched with each individual row of the right
table.

3. Why is this query not returning the expected results?


We have 1000 total rows in the orders table:

SELECT*FROMorders;
1000rowsinset(0.05sec)

And 23 of those orders are from the user with customer_id = 45:

SELECT*FROMordersWHEREcustomer_id=45;
23rowsinset(0.10sec)

Yet, when we SELECT the number of orders that are notfrom customer_id = 45, we
only get 973 results:

SELECT*FROMordersWHEREcustomer_id<>45;
973rowsinset(0.11sec)

973 + 23 = 996. But shouldn't the number of orders with customer_id equal to 45
plus the number of orders with customer_id not equal to 45 equal 1000? Why is this
query not returning the expected results?

The answer: this data set most likely contains order values with a NULL customer_id .
When using the SELECT clause with conditions, rows with the NULL value will not
match against either the = or the <> operator.

Our second query above could be modi ed as follows to produce the expected
results:

SELECT*FROMordersWHERE(customer_id<>45ORcustomer_idISNULL);
977rowsinset(0.11sec)
4. Why does one of these queries work while the other
does not?
Consider the following query, which returns the expected results:

SELECTCASEWHEN(3IN(1,2,3,NULL))THEN'Threeishere!'ELSE"Threeisn'there!"

/*
++
|result|
++
|Threeishere!|
++
1rowinset(0.00sec)
*/

The output "Threeishere!" is shown, because the value 3 is included in the IN


clause. But what about the following query?

SELECTCASEWHEN(3NOTIN(1,2,NULL))THEN"Threeisn'there!"ELSE'Threeishere!'

/*
++
|result|
++
|Threeishere!|
++
1rowinset(0.00sec)
*/

Three is not included in the second set so why does our query mistakenly deliver
the output, "Threeishere!" ?

The answer, once again, has to do with the way MYSQL handles NULL values. Let's
take a closer look. In our rst query, we ask whether the value 3 is included in the set
(1,2,3,NULL) . Our statement is functionally equivalent to the following:
SELECTCASEWHEN((3=1)OR(3=2)OR(3=3)OR(3=NULL))THEN'Threeishere!'

/*
++
|result|
++
|Threeishere!|
++
1rowinset(0.00sec)
*/

Since 3 is de nitely equal to 3, one of our OR conditions is met, and the statement
outputs, "Threeishere!" . Our second statement, on the other hand, asks whether
the value 3 is NOT included in the set (1,2,NULL) . This statement is functionally
equivalent to the following:

SELECTCASEWHEN((3<>1)AND(3<>2)AND(3<>NULL))THEN"Threeisn'there!"ELSE

/*
++
|result|
++
|Threeishere!|
++
1rowinset(0.00sec)
*/

In this case, the conditional check 3<>NULL fails, because in ANSI-standard SQL,
we need to use the ISNULL statement rather than the <> operator.

5. Can you construct a basic INNERJOIN ?


Consider our customers and orders tables, with the following respective schema:
CREATETABLE`customers`(
`customer_id`int(11)NOTNULLAUTO_INCREMENT,
`first_name`varchar(255)NOTNULL,
`last_name`varchar(255)NOTNULL,
`email`varchar(255)NOTNULL,
`address`varchar(255)DEFAULTNULL,
`city`varchar(255)DEFAULTNULL,
`state`varchar(2)DEFAULTNULL,
`zip_code`varchar(5)DEFAULTNULL,
PRIMARYKEY(`customer_id`)
);

CREATETABLE`orders`(
`order_id`int(11)NOTNULLAUTO_INCREMENT,
`customer_id`int(11)NOTNULL,
`order_placed_date`dateNOTNULL,
PRIMARYKEY(`order_id`),
KEY`customer_id`(`customer_id`),
FOREIGNKEY(`customer_id`)REFERENCES`customers`(`customer_id`)
);

Can you construct a simple SELECT statement that uses an INNERJOIN to combine
all information from both the customers and orders tables?

The answer here is really simple. Here's how we'd do it:

SELECT*FROMordersINNERJOINcustomersonorders.customer_id=customers.customer_id

6. Working with the AS statement


We've written a query based on the orders table above to select all orders from the
year 2016. But something is wrong with our query. Can you gure out what it is?

SELECTorder_id,customer_id,YEAR(order_placed_date)ASorder_yearFROMordersWHERE

Here's the answer: order_year is an alias , meaning that it's being used as another
name for a more complex reference: YEAR(order_placed_date) . It turns out that in
SQL, aliases can only be referenced in GROUPBY , ORDERBY , and HAVING clauses
they can't be used in WHERE clauses. Running the above code will produce the
following result:

ERROR1054(42S22):Unknowncolumn'order_year'in'whereclause'

To x this problem, we need to reiterate the de nition of the order_year alias in the
WHERE clause like so:

SELECTorder_id,customer_id,YEAR(order_placed_date)ASorder_yearFROMordersWHERE
498rowsinset(0.00sec)

7. Using the SUM function


Consider the following database schema:

CREATETABLE`products`(
`product_id`int(11)NOTNULLAUTO_INCREMENT,
`name`varchar(255)NOTNULL,
`price`decimal(19,4)NOTNULL,
PRIMARYKEY(`product_id`)
);

CREATETABLE`order_products`(
`order_product_id`int(11)NOTNULLAUTO_INCREMENT,
`order_id`int(11)NOTNULL,
`product_id`int(11)NOTNULL,
PRIMARYKEY(`order_product_id`),
KEY`order_id`(`order_id`),
KEY`product_id`(`product_id`),
FOREIGNKEY(`order_id`)REFERENCES`orders`(`order_id`),
FOREIGNKEY(`product_id`)REFERENCES`products`(`product_id`)
)

Can you write a query that nds the total order price (e.g., the sum of product.price
from each order) for all order_id s?

This question is a bit tough, as we'll have to use both the SUM function and the GROUP
BY clause to aggregate orders by order_id . Here's how we do it:
SELECTorder_id,SUM(price)AStotal_order_priceFROMorder_productsINNERJOINproducts
1000rowsinset(0.01sec)

Looking for more SQL prep? Don't forget to check out our resources
page (/sql/sql-resources/)!

Copyright 2016

You might also like