You are on page 1of 8

6/1/2014

Questions every good Database/SQL developer should be able to answer - Stack Overflow
sign up

Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no
registration required.

log in

tour

help

careers 2.0

Take the 2-minute tour

Questions every good Database/SQL developer should be able to answer [closed]


I was going through Questions every good .Net developer should be able to answer and was highly
impressed with the content and approach of this question and so in the same spirit, I am asking this
question for Database/SQL Developer.
What questions do you think should a good Database/SQL programmer be able to respond to?
sql

database

edited Jun 16 '12 at 22:32

community wiki
3 revs, 3 users 76%
Rachel

locked by Bill the Lizard Jun 16 '12 at 13:50


This question exists because it has historical significance, but it is not considered a good, on-topic question
for this site, so please do not use it as evidence that you can ask similar questions here. This question and its
answers are frozen and cannot be changed. More info: help center.

closed as not constructive by Flexo , BoltClock Feb 17 '12 at 11:57


As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by
facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended
discussion. If you feel that this question can be improved and possibly reopened, visit the help center for
guidance.
If this question can be reworded to fit the rules in the help center, please edit the question.
comments disabled on deleted / locked posts

22 Answers
The different types of JOINs:
INNER JOIN
LEFT and RIGHT OUTER JOIN
FULL JOIN
CROSS JOIN
See Jeff Atwood's Visual Explanation of JOINs
What is a key? A candidate key? A primary key? An alternate key? A foreign key?
What is an index and how does it help your database?
What are the data types available and when to use which ones?
answered Jan 22 '10 at 19:27

community wiki
marc_s

If you don't thoroughly understand joins, you can't be a database programmer. This would be the first
question and I wouldn't bother asking any others if the answer was wrong. HLGEM Jan 22 '10 at 20:03

Understanding set theory in general is a prerequisite to becoming a good sql developer. So I would add
unions and intersections to your list as well. Logicalmind Jan 22 '10 at 20:44

Joins?! Nuthin' a good denormalized, monolithic table of epic proportion can't handle... :p OMG Ponies Jan
22 '10 at 22:48

Let's not insult the set theorists - grokking unions and intersections doesn't equate with "understanding set

http://stackoverflow.com/questions/2119859/questions-every-good-database-sql-developer-should-be-able-to-answer

1/8

6/1/2014

Questions every good Database/SQL developer should be able to answer - Stack Overflow
theory" klochner Jan 22 '10 at 23:06

When asking about different types of joins, don't ask interviewee to explain them but present a set of
problems that requires the interviewee use each to solve. Many "database developers" can explain joins but
not write them or know when to apply them. Samuel Neff Jan 23 '10 at 5:01

show 3 more comments

A reprint of my answer here, as general guidelines for topics.

Basics
1. SELECTing columns from a table
2. Aggregates Part 1: COUNT, SUM, MAX/ MIN
3. Aggregates Part 2: DISTINCT, GROUP BY, HAVING

Intermediate
1.
2.
3.
4.
5.
6.
7.

JOINs, ANSI-89 and ANSI-92 syntax


UNION vs UNION ALL
NULL handling: COALESCE & Native NULL handling
Subqueries: IN, EXISTS, and inline views
Subqueries: Correlated
WITH syntax: Subquery Factoring/CTE
Views

Advanced Topics
Functions, Stored Procedures, Packages
Pivoting data: CASE & PIVOT syntax
Hierarchical Queries
Cursors: Implicit and Explicit
Triggers
Dynamic SQL
Materialized Views
Query Optimization: Indexes
Query Optimization: Explain Plans
Query Optimization: Profiling
Data Modelling: Normal Forms, 1 through 3
Data Modelling: Primary & Foreign Keys
Data Modelling: Table Constraints
Data Modelling: Link/Corrollary Tables
Full Text Searching
XML
Isolation Levels
Entity Relationship Diagrams (ERDs), Logical and Physical
Transactions: COMMIT, ROLLBACK, Error Handling
answered Jan 22 '10 at 20:40

community wiki
OMG Ponies

Personally, I think several of the "Advanced" items should go under "Intermediate" or maybe "Competent."
"Advanced" would be denormalizing, partitioning, query/index hints, replication, locking and deadlocks, and
all that crazy stuff we wish we didn't have to deal with. Nevertheless, good list! Aaronaught Jan 23 '10 at
0:33
@Aaronaught: Partitioning & replication are more DBA oriented. I kinda covered locking with isolation levels.
OMG Ponies Jan 23 '10 at 1:22
Piffle. I don't believe in labels; you're a DBA whether you accept the title or not. ;) Aaronaught Jan 23 '10 at
1:28

@Aaronaught: I'm a DBA if I have the access :p OMG Ponies Jan 23 '10 at 1:39

I would put Transactions as basic - if you don't know what to do then any data saved cannot be trusted
Mark Jan 24 '10 at 18:31

show 1 more comment

Here are a few:

http://stackoverflow.com/questions/2119859/questions-every-good-database-sql-developer-should-be-able-to-answer

2/8

6/1/2014

Questions every good Database/SQL developer should be able to answer - Stack Overflow
What is normalization and why is it important?
What are some situations where you would de-normalize data?
What is a transaction and why is it important?
What is referential integrity and why is it important?
What steps would to take to investigate reports of slow database performance?
answered Jan 22 '10 at 19:39

community wiki
Phil Sandler

add comment

What is sql injection and how do you prevent it?


What is a cursor and when would you use it (or not) and why?
answered Jan 22 '10 at 19:42

community wiki
DyingCactus

add comment

I've placed this answer because Erwin Smout posted a answer that was so wrong it highlighted that there
is probably a need to specifically guard against it.
Erwin suggested:
"Why should every SELECT always include DISTINCT ?"
A more appropriate question would be: If someone were to make the claim that: "every SELECT always
include DISTINCT"; how would you comment on the claim?
If a candidate is unable to shoot the claim down in flames they either:
Don't understand the problem with the claim.
Lack in critical thinking skills.
Lack in ability to communicate technical issues.
For the record
1. Suppose your query is correct, and does not return any duplicates, then including DISTINCT
simply forces the RDBMS to check your result (zero benefit, and a lot of additional processing).
2. Suppose your query is incorrect, and does return duplicates, then including DISTINCT simply
hides the problem (again with additional processing). It would be better to spot the problem and fix
your query... it'll run faster that way.
answered Jan 24 '10 at 16:48

community wiki
Craig Young

They might have read C.J.Dates 'Database in depth' who argues for a DISTINCT by default. (see
books.google.com page 52.) schlenk Dec 6 '10 at 12:17
I was just thinking the same thing about distinct today...+1 Shawn Mar 18 '11 at 14:16
add comment

At our company, instead of asking a lot of SQL questions that anyone with a good memory can answer,
we created a SQL Developers test. The test is designed to have the candidate put together a solid
schema with normalization and RI considerations, check constraints etc. And then be able to create some
queries to produce results sets we're looking for. They create all this against a brief design specification
we give them. They are allowed to do this at home, and take as much time as they need (within reason).
answered Jan 22 '10 at 19:38

community wiki
Randy Minder

add comment

I would give a badly written query and ask them how they would go about performance tuning it.
I would ask about set theory. If you don't understand operating in sets, you can't effectively query a
relational database.
I would give them some cursor examples and ask how they would rewrite them to make them set-based.

http://stackoverflow.com/questions/2119859/questions-every-good-database-sql-developer-should-be-able-to-answer

3/8

6/1/2014

Questions every good Database/SQL developer should be able to answer - Stack Overflow
If the job involved imports and exports I would ask questions about SSIS (or other tools involved in doing
this used by other datbases). If it involved writing reports, I would want to know that they understand
aggregates and grouping (As well as Crystal Reports or SSRS or whatever ereporting tool you use).
I would ask the difference in results between these three queries:
select a.field1
, a.field2
, b.field3
from table1 a
join table2 b
on a.id = b.id
where a.field5 = 'test'
and b.field3 = 1
select a.field1
, a.field2
, b.field3
from table1 a
left join table2 b
on a.id = b.id
where a.field5 = 'test'
and b.field3 = 1
select a.field1
, a.field2
, b.field3
from table1 a
left join table2 b
on a.id = b.id and b.field3 = 1
where a.field5 = 'test'
answered Jan 22 '10 at 20:01

community wiki
HLGEM

I plead guilty: what is the difference between the last two? I thought they effectively were the same.
fireeyedboy Jan 22 '10 at 21:21
The last two will return differnt results, the first two will return the same results. Putting a condition on table b
in the where clause converts it to an inner join because the entire result set has to meet that condition which
it won't if there is no matching record in table b. (Unless that condition is something like where b.id is null.)
Can't tell you how many times I've had to fix this kind of code (just this morning as a matter of fact).
HLGEM Jan 22 '10 at 21:37

I can tell you what's the same about them: they all have strange comma placement (I understand the reason
for it, but it takes a little getting used to). devuxer Jan 22 '10 at 22:52

add comment

What is the difference between a clustered index and a nonclustered index?


Another question I would ask that is not for a specific server would be:
What is a deadlock?
edited Jan 22 '10 at 23:56

community wiki
2 revs
Jose Chama

AFAIK, that terminology is only applicable to MySQL and SQL Server OMG Ponies Jan 22 '10 at 20:18

OMG is right. This is db-specific. Other db's have the same feature. Oracle has index organized tables which
are the same as a clustered index. Not only that, but there is almost no difference between a clustered index
and a nonclustered index that includes every column in the table. The real point of the question, I assume, is
to see if the person knows about the different physical storage characteristics of index types. Logicalmind
Jan 22 '10 at 20:38
What do other vendors call it? Mark Canlas Jan 22 '10 at 21:00
@OMG - AFAIK clustered indices originated in Sybase, a version of which MSSQL was until (at least) 6.5,
after which MS pretty much recoded the whole thing; one of their better efforts all round. Sybase ASE still
looks and feels much like 6.5 on the surface. Mike Woodhouse Jan 22 '10 at 21:29

Trivia: the whole concept of indexes, including syntax and terminology, is not specified in the SQL standard!

http://stackoverflow.com/questions/2119859/questions-every-good-database-sql-developer-should-be-able-to-answer

4/8

6/1/2014

Questions every good Database/SQL developer should be able to answer - Stack Overflow
It's an implementation detail. It's amazing there's as much similarity between vendors as there is.
Bill Karwin Jan 22 '10 at 22:54
show 2 more comments

Knowing not to use, and WHY not to use:


SELECT *
answered Jan 22 '10 at 19:44

community wiki
Jack Marchetti

I do that a lot. It saves typing, and I don't have to remember the col names. What's wrong with that? gary
Jan 22 '10 at 20:14

@gary: Risks returning unnecessary data, hides column name change errors... OMG Ponies Jan 22 '10 at
20:20

@gary, should not be used in production code because if the underlying data structure changes it will break
your query. Having said that I'm as guilty as the next developer of doing so from time to time - but only when I
know the sytem is under my total control Cruachan Jan 22 '10 at 20:25

@gary: If you have text/blob columns and are returning multiple rows, you will be putting unnecessary strain
on the system. Craig Young Jan 22 '10 at 22:09

Ahem... how about the fact that unless you maintain massive indexes including every single column,
SELECT * will prevent efficient index operations using a covering index? This is the most important reason!
Aaronaught Jan 23 '10 at 0:25

show 1 more comment

An interesting question would involve relational division, or how to express a "for all" relationship, which
would require nested not exists clauses.
The question comes straigh from this link.
Given the following tables, representing pilots that can fly planes and planes in a hangar:
create table PilotSkills (
pilot_name char(15) not null,
plane_name char(15) not null
)
create table Hangar (
plane_name char(15) not null
)
Select the names of the pilots who can fly every plane in the hangar.
The answer:
select distinct pilot_name
from PilotSkills as ps1
where not exists (
select * from hangar
where not exists (
select * from PilotSkills as ps2 where
ps1.pilot_name = ps2.pilot_name and
ps2.plane_name = hangar.plane_name
)
)
Or ...
Select all stack overflow users that have accepted answers in questions tagged with the 10 most popular
programming languages.
The (possible) answer (assuming an Accepted_Answers view and a Target_Language_Tags table
with the desired tags):

http://stackoverflow.com/questions/2119859/questions-every-good-database-sql-developer-should-be-able-to-answer

5/8

6/1/2014

Questions every good Database/SQL developer should be able to answer - Stack Overflow
select distinct u.user_name
from Users as u
join Accepted_Answers as a1 on u.user_id = a1.user_id
where not exists (
select * from Target_Language_Tags t
where not exists (
select *
from Accepted_Answers as a2
join Questions as q on a2.question_id = q.question_id
join Question_Tags as qt on qt.question_id = q.question_id
where
qt.tag_name = t.tag_name and
a1.user_id = a2.user_id
)
)
edited Jan 25 '10 at 18:35

community wiki
6 revs
Jordo

It is most certainly an interesting question. However, I doubt whether there are many companies that would
need their developers to deal with such scenarios on a day-to-day basis. Craig Young Jan 24 '10 at 17:07

Fair enough, but I think that answering this question shows a good deal of logic reasoning, which is essential
for a great database developer. Jordo Jan 25 '10 at 15:34

add comment

Why should we hire you when we have a sophisticated application using a properly-optimized ORM and
implementing caching systems such as memcached?
This is a serious question, they should be able to justify their existence. As Jeff Atwood likes to say
"Hardware is Cheap, Programmers are Expensive"
edited Jan 22 '10 at 20:18

community wiki
2 revs
jcm

10 ORM and optimized is an oxymoron. An ORM makes development more efficient, not runtime performance.
And it's nice to use memcache, but you still need someone analyzing what data is most advantageous to
store in the caching layer. Bill Karwin Jan 22 '10 at 19:37

@Bill so all of that hibernate profiling, eager/lazy loading, batch fetching, none of that is optimization? And
while it is nice to have someone run statistics on the data to see what is putting the cache to use best,
wouldn't that be more of a DBA job than a SQL Developer? Heck, I don't see why developers can't do that
themselves as part of the optimization process. jcm Jan 22 '10 at 20:07

@jcm: No, they aren't optimizations. Those features you mention are compensating for the inefficiency
caused by using an ORM. They would be needed if you weren't using an ORM. :-) Bill Karwin Jan 22 '10
at 20:16

But development time can't even compare to the extra cost to the business of inefficient code. If I cost a
1,000 users five muntes a day due to inefficient code, wasn't it worth it to the business to actually spend the
extra two minutes to write efficient code in the first place. And frankly I don't see where ORMs save that
much time in development when the developer actually is an expert in SQL. HLGEM Jan 22 '10 at 21:18

@HLGEM: In fact, it seems most businesses would rather shorten time-to-market, even it means they
produce poor code. They may think they'll always have the opportunity to fix pain points later. I'm not
endorsing that practice, just observing that it's common. :-) Bill Karwin Jan 22 '10 at 22:50

show 3 more comments

Compare and contrast the differences between a sql/rdbms solution and nosql solution. You can't claim
to be an expert in any technology without knowing its strengths and weaknesses as compared to its
competitors.
edited Jan 22 '10 at 20:34

community wiki
2 revs
Logicalmind

add comment

Give an example where denomralization is preferable.


(I like this one because people come rampaging out of college looking to put everything into 3rd normal

http://stackoverflow.com/questions/2119859/questions-every-good-database-sql-developer-should-be-able-to-answer

6/8

6/1/2014

Questions every good Database/SQL developer should be able to answer - Stack Overflow
form)
answered Jan 22 '10 at 19:31

community wiki
Will

hehe been there :) R van Rijn Jan 22 '10 at 19:31


"A log file" is a good answer. You might wish to record actions that change the state of the database, but if
you link the log table to other tables the true original context of the actions can be lost. And don't even
mention cascading deletes... Will Jan 22 '10 at 19:33
should a log file even be in the database, though? Sir, I question your question! Jeff Atwood Jan 22 '10 at
19:34

Denormalization is preferable when the importance of fetching complex data quickly is greater than the cost
of cleaning up a database with bollixed data integrity. Bill Karwin Jan 22 '10 at 20:07

@Jeff Ever heard of Sarbanes Oxley? :) There are other reasons to do so, but the answer to your question is,
"sometimes, but only when you have no other choice". Dane Mar 31 '10 at 7:21

show 1 more comment

What database types had caused to you, as a developer, more trouble to understand and debug?
Expected answer, IMHO, experience with issues using different date/timestamp types and BLOBs.
When is convenient to use bitmap indexes?
answered Jan 22 '10 at 19:52

community wiki
JuanZe

add comment

What explain plan does and how interpret the results you get from it.
answered Jan 22 '10 at 21:15

community wiki
David Oneill

add comment

How do you detect and resolve concurrency issues in the application layer?
What locking paradigms are typically available, and discuss their pros and cons.
Discuss NULL values and related issues.
What is the largest database system you've worked on in terms of: # tables, # rows, # users.
Also the following platform specific (SQL Server) questions:
Discuss IDENTITY columns.
What is the timestamp datatype used for?
answered Jan 22 '10 at 21:32

community wiki
Craig Young

add comment

The application is in use 24 hours a day. Your maintenance / update window is 2 hours every month,
how do you plan to minimise disruption?
answered Jan 23 '10 at 0:08

community wiki
Tim Jarvis

add comment

Almost everything is mentioned here. I would like to share one question which I was asked by a senior
manager on database. I found the question pretty interesting and if you think about it deeply, it sort of
has lot of meaning in it.
Question was - How would you describe database to your 5 year old kid ?
answered Jan 25 '10 at 17:21

community wiki
Shamik

http://stackoverflow.com/questions/2119859/questions-every-good-database-sql-developer-should-be-able-to-answer

7/8

6/1/2014

Questions every good Database/SQL developer should be able to answer - Stack Overflow
add comment

Explain the difference between an inner and outer join.


What is a Cartesian product?
Explain 3rd normal form
answered Jan 22 '10 at 19:28

community wiki
keithwarren7

add comment

Explain
Explain
Explain
Explain

possible constraints on tables


views (and materialized)
sequences
triggers

answered Jan 22 '10 at 19:36

community wiki
R van Rijn

add comment

What are the downsides of using adhoc/on-the-fly SQL statements and what would you do instead?
This area can have a huge impact on performance and security of the db. Get this wrong, and you could
end up in a world of pain.
In what situations would you use adhoc/on-the-fly SQL statements?
Because there is always an exception to the rule :)
answered Jan 22 '10 at 19:45

community wiki
AdaTheDev

add comment

"Why should every SELECT always include DISTINCT ?"


answered Jan 22 '10 at 23:09

community wiki
Erwin Smout

I don't understand your point. Could you expand on what you mean? Logicalmind Jan 22 '10 at 23:53
Unless the intention is to check if I would challenge that garbage statement; if someone were to seriously
ask me that question, I'd tell them I'm no longer interested in the job. Craig Young Jan 23 '10 at 9:57

Okay, it seems Mr. Smout is serious about this. :/ Before you do any more damage, please take note: Your
claim is wrong, in fact the converse in more true! 1) Suppose your query is correct, and does not return any
duplicates, then including DISTINCT simply forces the RDBMS to check your result (zero benefit, and a lot of
additional processing). 2) Suppose your query is incorrect, and does return duplicates, then including
DISTINCT simply *hides the problem (again with additional processing). It would be better to spot the problem
and fix your query... it'll run faster that way. Craig Young Jan 24 '10 at 16:30

add comment

Not the answer you're looking for? Browse other questions tagged sql database or ask
your own question.

http://stackoverflow.com/questions/2119859/questions-every-good-database-sql-developer-should-be-able-to-answer

8/8

You might also like