You are on page 1of 74

Basic Oracle SQL

March 3, 2006

Joe Fuller
Capacity Planning & Performance Tuning Analyst
Ed Alley
MRM End User Programs Training & Support Analyst
eBay Inc. Proprietary & Confidential
Course Summary

• Basic SQL Syntax


– Constructing the SQL
– The Where Clause
– Formatting Columns and Changing Names
– Grouping/Aggregating/Summarizing
– Having Clause
– Sub-Queries
– Statistical Functions (Listing)
• Joins
– Join types
– Discussion of each type of Join
• Other topics

TM

eBay Inc. Proprietary & Confidential 2


Basic Oracle SQL

I. Basic SQL Syntax


II. Joins
III.Other topics

TM

eBay Inc. Proprietary & Confidential 3


Construction of a Query

 First, let’s do some definitions.


 SQL = Structured Query Language (pronounced “Sea-kwull”)
 Query = A request for information from a relational database.

There are several sections to an SQL statement.

COLUMN LIST (select) What table columns are we retrieving.


FROM LIST From which tables or views.
WHERE CLAUSE (filters) How are we joining these tables, or which rows do we
want to retrieve.
OTHER CONDITIONS Do we want to ORDER (sort), GROUP or apply other
filters to the statement.

TM

eBay Inc. Proprietary & Confidential 4


Constructing the SQL

SELECT * Column list


The “*” equates to “all” FROM dw_lstg_item Table name
;

SELECT
The indentation improves Column list
user_id
readability and ability to
,user_name
spot errors
,user_slctd_id
,email
FROM dw_users Table name
ORDER BY user_slctd_id Sort order
;

Q1: Can you easily spot SELECT user_id, user_name, user_slctd_id,


the error? ,email ,state ,feedbackscore FROM dw_users
ORDER BY email; TM

eBay Inc. Proprietary & Confidential 5


Conditional Expressions

• Standard Equality and non-Equality Conditions.


– = Equal to
– != or <> Not Equal to
– > Greater Than
– >= Greater Than or Equal To
– < Less Than
– <= Less Than or Equal To
– LIKE Used for pattern matching, e.g. LIKE ‘%FULLER%’.
Uses the ‘%’ to mean essentially “anything”. So this LIKE expression gets ‘Joe
Fuller’,’Fuller and Company’ etc.
– NOT LIKE Functions to exclude rows based on the LIKE pattern
matching.
– IN / NOT IN Determines row inclusion (exclusion) based on a value
of one or more columns being in the rows returned from a sub-query. (Sub-
Queries are covered later.)
– IS NULL Has no value, the column is empty.
– IS NOT NULL Has a value.
– EXISTS / NOT EXISTS Similar to IN / NOT IN.
TM

eBay Inc. Proprietary & Confidential 6


NULL – What is it?

A null represents any of three things:


• an empty column
• an unknown value
• an unknowable value

Nulls are neither values nor do they signify values; they represent the absence of value. A
null is a place holder indicating that no value is present.

TM

eBay Inc. Proprietary & Confidential 7


THE WHERE CLAUSE

SELECT user_id
The where clause specifies what rows are
,user_slctd_id returned or qualify. This is thought of as an
,trim(user_name) initial filter.
,feedback_score
,cellphone
FROM dw_users
WHERE feedback_score >= 100
AND user_slctd_id NOT LIKE ‘%store%’
AND user_name LIKE ‘%fuller%’
ORDER BY 4 desc
;

TM

eBay Inc. Proprietary & Confidential 8


EXERCISE 1 – SIMPLE QUERY

Assuming that everyone has a user logon for eBay, find your USER_ID based on your personal e-
mail id OR the user id you created for your account.

DATABASE: Access_views
TABLE: DW_USERS
COLUMNS: USER_ID
EMAIL
FILTERS: EMAIL = ‘your user id’
Or find your USER_ID based on your User Selected Id (the user id you chose when you created
your account)

COLUMNS: USER_ID
USER_SLCTD_ID
FILTERS: USER_SLCTD_ID = ‘your eBay logon’

If you do not have an e-bay account, contact the instructor.

TM

eBay Inc. Proprietary & Confidential 9


EXERCISE 1 - EXAMPLE

SELECT USER_ID
, EMAIL
FROM DW_USERS
WHERE EMAIL = 'jofuller2@comcast.net‘
;

SELECT USER_ID
, USER_SLCTD_ID
FROM DW_USERS
WHERE USER_SLCTD_ID = ‘dogbarn‘
;

TM

eBay Inc. Proprietary & Confidential 10


Formatting Columns & Changing Names

Creating a view of the Format with leading


Users table containing only Change name of
spaces 1  “ 1” column to “EndUserID”
those from Virginia.

REPLACE VIEW DW_USERS_VA AS


SELECT
User_ID (FORMAT '9999') AS EndUserID Format with zero
,Feedback_Score (FORMAT ‘ZZ9') fill 23  0023
,User_Name (TITLE ‘Name')
Display column title but
,date_confirm (Format’MMM DD,YYYY’)
don’t change column
FROM access_views.DW_USERS name
WHERE STATE = ‘VA’;
Format date
1/1/2005  “JAN 01,2005”
Note: formatting does NOT change the data –
only how it is displayed TM

eBay Inc. Proprietary & Confidential 11


EXERCISE 2 – RESULTS SET FORMATTING

Using the USER_ID that you retrieved from the last exercise, perform the following query:

DATABASE NAME: Access_views


TABLE NAME: DW_USERS
COLUMNS: USER_NAME
FEEDBACK_SCORE

FILTER: USER_ID (use your USER_ID from the previous exercise)

FORMATTING: USER_NAME TITLE ‘User Name’


FEEDBACK_SCORE ‘ZZZ9’ TITLE ‘Feedback’

TM

eBay Inc. Proprietary & Confidential 12


EXERCISE 2 - EXAMPLE

SELECT User_Name (TITLE 'User Name')


, FEEDBACK_SCORE (TITLE 'Feedback', FORMAT 'ZZZ9')
FROM access_views.DW_USERS
WHERE USER_ID = 5651699
;

TM

eBay Inc. Proprietary & Confidential 13


Grouping/Aggregating/Summarizing

SELECT
auct_end_dt Group
,COUNT(*) as Nbr_of_Listings Aggregate
,SUM(qty_sold) Functions
FROM dw_lstg_item
GROUP BY 1 When you summarize data the column list
must contain ONLY the grouped columns
ORDER BY 3 desc
and the aggregate functions.
; This is a very common error!
Note: You may reference the column
number (that is the order in which it
appears in the select list) instead of the
column name.

TM

eBay Inc. Proprietary & Confidential 14


EXERCISE 3 – SUM AND COUNT

DATABASE NAME: Access_views


TABLE NAME: dw_lstg_item

FILTER: AUCT_END_DT on ‘2006-06-05’

COLUMNS: AUCT_END_DT

AGGREGATIONS: COUNT OF LISTINGS


SUM OF QTY_SOLD

(Note, if your query runs for more than 30 seconds, abort it and contact the instruct

TM

eBay Inc. Proprietary & Confidential 15


EXERCISE 3 - EXAMPLE

SELECT AUCT_END_DT
,SUM(QTY_SOLD)
,COUNT(*)
FROM DW_LSTG_ITEM
WHERE AUCT_END_DT = '2006-06-05'
GROUP BY 1
;

TM

eBay Inc. Proprietary & Confidential 16


Having Clause

SELECT
auct_end_dt Group
,count(*) as Nbr_of_Listings Aggregate
,cast(sum(qty_sold) as decimal(18,0)) Functions
FROM dw_lstg_item
GROUP BY 1
HAVING (Nbr_of_Listings > 1000000 The cast function changes
OR sum(qty_sold) > 1000000) the data type or length. In
ORDER BY 3 desc this case, it creates a data
; type sufficiently large to
contain the result.
Contrast this to format
which does not change the
data – only how it is
The having clause specifies what
displayed.
GROUPS are returned. This is also
thought of as a secondary filter.
TM

eBay Inc. Proprietary & Confidential 17


Sub-Queries

•A join qualifies which rows from two or more tables will be matched to create
rows of an answer set.
•The answer set can include data from one or more of the joined tables

JOIN SUB-QUERY

TABLE 1 TABLE 2

TABLE 1 TABLE 2

RESULTS
RESULTS

A subquery qualifies which rows are to be SELECTed for input to the next level of the query.
Data SELECTed in Table 2 is not output, but is used to qualify the rows SELECTed in Table 1.
TM

eBay Inc. Proprietary & Confidential 18


Sub-Queries continued

Sub-Queries are useful in filtering out rows based on a value that exists in the table being
queried that also exists in a second table, etc. There are essentially two types of sub-
queries. One uses the IN / NOT IN operators while the second uses the EXISTS /
NOT EXISTS operators.
SELECT *
FROM dw_lstg_item
WHERE item_site_id IN(SELECT site_id
FROM dw_sites
WHERE site_cntry_id = 77)
;
OR
SELECT *
FROM dw_lstg_item
WHERE EXISTS (SELECT 1 – THIS IS A DUMMY VALUE
FROM dw_sites
WHERE dw_sites.site_id = dw_lstg_item.item_site_id
AND dw_sites.site_cntry_id = 77)
;
TM

eBay Inc. Proprietary & Confidential 19


EXERCISE 4 – PUTTING IT ALL TOGETHER

DATABASE NAME: Access_views


TABLE NAME: DW_LSTG_ITEM

REQUIREMENT: Get the ITEM_SITE_ID and COUNT for all auctions with an
end date of ‘2006-06-05’ belonging to SITE_CNTRY_ID of 1.

COLUMNS: ITEM_SITE_ID

AGGREGATIONS: COUNT

SUB-QUERY: ITEM_SITE_ID IN
DATABASENAME Access_Views
TABLENAME DW_SITES
FILTER SITE_CNTRY_ID = 1

TM

eBay Inc. Proprietary & Confidential 20


EXERCISE 4 - EXAMPLE

SELECT ITEM_SITE_ID
,COUNT(*)
FROM dw_lstg_item
WHERE item_site_id IN(SELECT site_id
FROM dw_sites
WHERE site_cntry_id = 1)
AND AUCT_END_DT = '2006-06-05'
GROUP BY 1
;
(Why is this not the same?)
SELECT ITEM_SITE_ID
,COUNT(*)
FROM dw_lstg_item
WHERE item_site_id = 1
AND AUCT_END_DT = '2006-06-05'
GROUP BY 1
;
TM

eBay Inc. Proprietary & Confidential 21


Basic Oracle SQL

I. Basic SQL Syntax


II. Joins
III.Other topics

TM

eBay Inc. Proprietary & Confidential 22


Joins – The Foundation of Relational Databases

INNER JOIN LEFT OUTER JOIN RIGHT OUTER JOIN

FULL OUTER JOIN CROSS JOIN

These are Join “types”.


Oracle supports each of these Join Types!
(Not every MPP system does.) TM

eBay Inc. Proprietary & Confidential 23


INNER JOIN

The syntax for an INNER JOIN is simple! (Also known as a SIMPLE JOIN or EQUI-JOIN.)

SELECT A.item_id
,A.auct_end_dt
,B.net_rev_blng_curncy
FROM dw_lstg_item A
,dw_lstg_item_rev B
WHERE A.item_id = B.item_id
AND A.auct_end_dt = B.auct_end_dt
;
 This query only retrieves rows based on the following join conditions. The row must
exist in each table, thus the term “INNER JOIN”
 Item_ID matches
 Auct_End_DT matches

TM

eBay Inc. Proprietary & Confidential 24


ANSI SYNTAX FOR JOINS

Here is the same syntax for an INNER JOIN based on the ANSI-SQL 92 standard. You
must use this syntax for any join other than INNER.

SELECT A.item_id
,A.auct_end_dt
,B.net_rev_blng_curncy
FROM dw_lstg_item A
INNER JOIN dw_lstg_item_rev B
ON A.item_id = B.item_id
AND A.auct_end_dt = B.auct_end_dt
;
(You may also simplify INNER JOIN to JOIN. The INNER is assumed.)
FROM dw_lstg_item A
JOIN dw_lstg_item_rev B
ON A.item_id = B.item_id
AND A.auct_end_dt = B.auct_end_dt
TM

eBay Inc. Proprietary & Confidential 25


INNER JOIN

DW_AUCTION_CODES
SELECT I.ITEM_ID
,I.AUCT_END_DT
,C.AUCT_TYPE_DESC
FROM DW_LSTG_ITEM I
INNER JOIN DW_AUCTION_CODES C
ON C.AUCT_TYPE_CODE =
I.AUCT_TYPE_CODE;

RESULTS SET

DW_LSTG_ITEM

Note: Only those rows where the AUCT_TYPE_CODE exists in


BOTH tables show in the results set! TM

eBay Inc. Proprietary & Confidential 26


Left Outer Join

A LEFT OUTER JOIN returns all rows from the “left” table, and only those rows that match
from the “right” table. This is handy when there may be missing values in the “left”
table, but you want to get a description or other information from the “right” table.

SELECT A.item_id
,A.auct_end_dt
,COALESCE(B.curncy_desc,'Dunno') AS CURRENCY_DESC
FROM dw_lstg_item A
LEFT OUTER JOIN dw_currencies B
ON A.lstg_curncy_id = B.curncy_id
;

You may also abbreviate LEFT OUTER JOIN to LEFT JOIN.


FROM dw_lstg_item A
LEFT JOIN dw_currencies B
ON A.lstg_curncy_id = B.curncy_id
TM

eBay Inc. Proprietary & Confidential 27


LEFT OUTER JOIN

DW_AUCTION_CODES
SELECT I.ITEM_ID
,I.AUCT_END_DT
,COALESCE(C.AUCT_TYPE_DESC,
’Unknown’)
FROM DW_LSTG_ITEM I
LEFT OUTER JOIN DW_AUCTION_CODES C
ON C.AUCT_TYPE_CODE
= I.AUCT_TYPE_CODE;
RESULTS SET

DW_LSTG_ITEM

Note: Every row from the LEFT OUTER table (DW_LSTG_ITEM)


is returned, along with those values from DW_AUCTION_CODES TM
that exist. The COALESCE function replaces a NULL value with
‘Unknown’.
eBay Inc. Proprietary & Confidential 28
Right Outer Join

This join is the inverse or opposite of the LEFT OUTER JOIN. There is no difference in the
functionality of these two. Just be careful when designating the left and right tables!

SELECT B.item_id
,B.auct_end_dt
,COALESCE(A.curncy_desc,'Dunno') AS CURRENCY_DESC
FROM dw_currencies A
RIGHT OUTER JOIN dw_lstg_item B
ON B.lstg_curncy_id = A.curncy_id
;
The RIGHT OUTER JOIN may be simplified to RIGHT JOIN.

FROM dw_currencies A
RIGHT JOIN dw_lstg_item B
ON B.lstg_curncy_id = A.curncy_id
;
TM

eBay Inc. Proprietary & Confidential 29


RIGHT OUTER JOIN

DW_AUCTION_CODES
SELECT I.ITEM_ID
,I.AUCT_END_DT
,COALESCE(C.AUCT_TYPE_DESC,
’Unknown’)
FROM DW_AUCTION_CODES C
RIGHT OUTER JOIN DW_LSTG_ITEM I
ON C.AUCT_TYPE_CODE
= I.AUCT_TYPE_CODE;

RESULTS SET

DW_LSTG_ITEM

Note: Every row from the RIGHT OUTER table (DW_LSTG_ITEM)


is returned, along with those values from DW_AUCTION_CODES TM
that exist. The COALESCE function replaces a NULL value with
‘Unknown’.
eBay Inc. Proprietary & Confidential 30
FULL OUTER JOIN

This interesting join is used when you want to retrieve every row, whether there is a match or not based
on the join criteria! When would you use this? An example is detecting “true changes” for a table.

SELECT COALESCE(ORIG.item_id,STG.item_id) AS item_id


,CASE
WHEN ORIG.item_id IS NULL
AND STG.item_id IS NOT NULL
THEN 'INSERT'
WHEN ORIG.item_id IS NOT NULL
AND STG.item_id IS NOT NULL
THEN 'UPDATE'
WHEN ORIG.item_id IS NOT NULL
AND STG.item_id IS NULL
THEN 'DELETE'
ELSE 'UNKNOWN'
END AS Process_type
FROM dw_lstg_item ORIG
FULL OUTER JOIN dw_lstg_item_w STG
ON STG.item_id = ORIG.item_id
;
The COALESCE function (more on that later) insures that we have an item_id to work with in subsequent
processing.
TM

eBay Inc. Proprietary & Confidential 31


FULL OUTER JOIN

• From the previous page, we are joining DW_LSTG_ITEM with


DW_LSTG_ITEM_W with a FULL OUTER JOIN.
• Those rows that exist in both, indicate that DW_LSTG_ITEM will be updated.
• Those rows that exist in DW_LSTG_ITEM but not in DW_LSTG_ITEM_W will be
deleted. (This is ONLY for example!)
• Those rows that do not exist in DW_LSTG_ITEM but do exist in
DW_LSTG_ITEM_W will be inserted into DW_LSTG_ITEM. (New rows)

DW_LSTG_ITEM DW_LSTG_ITEM_W RESULTS

TM

eBay Inc. Proprietary & Confidential 32


CROSS JOINS (Cartesian Products)

We see these joins, generally, when they are unintentionally created. In the CROSS JOIN
every column in the “left” table is joined to every column in the “right” table! So, if you
have a billion row table and CROSS JOIN it to a 100 row table, your answer set will
have 100 billion rows! There are uses for deliberate CROSS JOINs in SQL that will be
discussed later.

SELECT dw_lstg_item.item_id
,dw_lstg_item.auct_end_dt
,COALESCE(B.auct_type_desc,'Dunno') AS AUCT_TYPE
FROM dw_lstg_item A
INNER JOIN dw_auction_codes B
ON A.auct_type_code = B.auct_type_code
;

Q: What is wrong with this query? Why will it perform a CROSS JOIN (unintentionally)?

TM

eBay Inc. Proprietary & Confidential 33


CROSS JOINS (Cartesian Products)

From the previous page:


SELECT dw_lstg_item.item_id
,dw_lstg_item.auct_end_dt
,COALESCE(B.auct_type_desc,'Dunno') AS AuctDesc
FROM dw_lstg_item A
INNER JOIN dw_auction_codes B
ON A.auct_type_code = B.auct_type_code
;

DW_AUCTION_CODES DW_LSTG_ITEM DW_LSTG_ITEM


(aliased as B) (aliased as A) (referred to in the SELECT)

TM

eBay Inc. Proprietary & Confidential 34


CROSS JOINS (Cartesian Products)

And this is the unintended CARTESIAN RESULTS!

QUESTION: Why? Let’s discuss.

TM

eBay Inc. Proprietary & Confidential 35


A “Real Life” example…

SELECT
                U.USER_ID,
                U.USER_CRE_PRD_ID AS RegMonth,
                upper(DW_USERS.GENDER_MFU) AS Gender,
                CASE WHEN U.DATE_OF_BIRTH IS NULL THEN CAST('1900-01-01' AS Date) ELSE CAST(U.DATE_OF_BIRTH AS Date) END AS BirthDate,
                L.AUCT_END_DT,
                CASE
                                WHEN L.AUCT_TYPE_CODE IN (1,2,4,5) THEN 'Auction'
                                WHEN L.AUCT_TYPE_CODE IN (9) THEN 'FP'
                                WHEN L.AUCT_TYPE_CODE IN (7) THEN 'SIF'
                                ELSE 'Other'
                END AS ListType,
                CASE
                                WHEN C.USER_DEFINED_FIELD1 IN (4) THEN 'BI'
                                WHEN C.USER_DEFINED_FIELD1 IN (1,10,11,14,22,26,28,35) THEN 'Collectibles'
                                WHEN C.USER_DEFINED_FIELD1 IN (23) THEN 'Real Estate'
                                WHEN C.USER_DEFINED_FIELD1 IN (27,99) THEN 'Everything Else'
                                WHEN C.USER_DEFINED_FIELD1 IN (12,13,20,21,31,33,34,38,40) THEN 'Technology'
                                WHEN C.USER_DEFINED_FIELD1 IN (3,18,19,30,32) THEN 'Media'
                                WHEN C.USER_DEFINED_FIELD1 IN (5,7,8,41) THEN 'Motors'
                                WHEN C.USER_DEFINED_FIELD1 IN (2,6,9,15,16,17,24,25,29,36,37,39) THEN 'Lifestyle'
                                ELSE 'Unknown'                    
                END AS Vertical
FROM
                DW_USERS As U
                INNER JOIN dw_user_cofala AS COFA ON COFA.USER_ID=U.USER_ID
                INNER JOIN DW_LSTG_ITEM AS L on COFA.COFA_BID_ITEM_ID=L.ITEM_ID
                INNER JOIN DW_CATEGORY_GROUPINGS AS C ON COFA.COFA_BID_SITE_ID=C.SITE_ID AND L.LEAF_CATEG_ID=C.LEAF_CATEG_ID
WHERE
                U.USER_CNTRY_ID=101 AND (U.USER_ID MOD 100 = 12)

 Question: Can you spot the culprit? TM

eBay Inc. Proprietary & Confidential 36


Don’t Mix Syntax!

If your query will have both INNER and OUTER joins, use ANSI syntax for the entire
query! When syntaxes are mixed, it makes the query hard to follow and hard to
insure that you do not have an unintentional PRODUCT JOIN!

Q: What is being joined in this query?


SELECT …
FROM dw_lstg_item
,dw_lstg_item_rev
LEFT OUTER JOIN dw_currencies
ON dw_currencies.curncy_id = dw_lstg_item.item_curncy_id
,dw_calendar
,dw_category_groupings
WHERE dw_lstg_item.item_id = dw_lstg_item_rev.item_id
AND dw_lstg_item.auct_end_dt = dw_lstg_item_rev.auct_end_dt
AND dw_calendar.cal_date = dw_lstg_item.auct_start_dt
AND dw_category_groupings.site_id = dw_lstg_item.item_site_id
AND dw_category_groupings.leaf_categ_id = dw_lstg_item.leaf_categ_id
AND dw_lstg_item_rev > 1000;
TM

eBay Inc. Proprietary & Confidential 37


Don’t Mix Syntax (continued)

Is this clearer?
SELECT …
FROM dw_lstg_item
INNER JOIN dw_lstg_item_rev
ON dw_lstg_item.item_id = dw_lstg_item_rev.item_id
AND dw_lstg_item.auct_end_dt = dw_lstg_item_rev.auct_end_dt
LEFT OUTER JOIN dw_currencies
ON dw_currencies.curncy_id = dw_lstg_item.item_curncy_id
INNER JOIN dw_calendar
ON dw_calendar.cal_date = dw_lstg_item.auct_start_dt
INNER JOIN dw_category_groupings
ON dw_category_groupings.site_id = dw_lstg_item.item_site_id
AND dw_category_groupings.leaf_categ_id =
dw_lstg_item.leaf_categ_id
WHERE dw_lstg_item_rev > 1000;

TM

eBay Inc. Proprietary & Confidential 38


EXERCISE 5 – JOINS

DATABASE: Access_Views
TABLENAME(S): DW_LSTG_ITEM
DW_LSTG_ITEM_COLD

COLUMNS: AUCT_END_DT

AGGREGATIONS: SUM of QTY_SOLD (dw_lstg_item)


AVG of VISITCOUNT (dw_lstg_item_cold)

FILTERS: AUCT_END_DT = ‘2006-06-05’


(Hint: for this exercise, please apply to BOTH tables!)

TM

eBay Inc. Proprietary & Confidential 39


EXERCISE 5 - EXAMPLE

SELECT dw_lstg_item.AUCT_END_DT
, AVG( dw_lstg_item_cold.VISITCOUNT)
,SUM( dw_lstg_item.QTY_SOLD)

FROM dw_lstg_item
INNER JOIN dw_lstg_item_cold
ON dw_lstg_item_cold.ITEM_ID = dw_lstg_item.ITEM_ID
AND dw_lstg_item_cold.AUCT_END_DT = dw_lstg_item.AUCT_END_DT

WHERE dw_lstg_item.AUCT_END_DT = '2006-06-05'


AND dw_lstg_item_cold.AUCT_END_DT = '2006-06-05'
GROUP BY 1
;

TM

eBay Inc. Proprietary & Confidential 40


Basic Oracle SQL

I. Basic SQL Syntax


II. Joins
III.Other topics

TM

eBay Inc. Proprietary & Confidential 41


DERIVED TABLES

• A derived table defines a temporary named result set from which the query can select
data.
• Derived tables do not function like a sub-query. The derived table is not used to limit
rows as it returns data just like a view or a table query.
• Follow these rules when using derived tables:
– A unique table name is required for the derived table.
– Qualified column names are mandatory when you specify otherwise ambiguous
column names in the select list used to build derived tables.
– This rule is parallel to, and consistent with, the rules for creating a view.

TM

eBay Inc. Proprietary & Confidential 42


DERIVED TABLE - EXAMPLE

SELECT X.APPLICATION_ID (NAMED "App ID" )


, DW_COUNTRIES.CNTRY_DESC (NAMED "Seller Country" )
, DW_SITES.SITE_NAME (NAMED "Site" )
,X.Sellers (NAMED "Sellers")
FROM(SELECT APPLICATION_ID
, SLR_CNTRY_ID
, ITEM_SITE_ID
, COUNT(DISTINCT SLR_ID) AS “Sellers"
FROM dw_lstg_item
WHERE ( dw_lstg_item.LSTG_STATUS_ID = 1 )
AND ( dw_lstg_item.AUCT_TYPE_CODE <> 12 )
AND dw_lstg_item.WACKO_YN = 'N'
AND ( dw_lstg_item.AUCT_END_DT
BETWEEN '2005-10-01' AND '2005-12-31' )
GROUP BY 1,2,3 Named the derived
) AS X table ‘X’. That is
, DW_COUNTRIES how we reference
, DW_SITES
WHERE X.SLR_CNTRY_ID = DW_COUNTRIES.CNTRY_ID
the table and the
AND X.ITEM_SITE_ID = DW_SITES.SITE_ID columns in the
ORDER BY 1 , 2 ; query.

TM

eBay Inc. Proprietary & Confidential 43


EXERCISE 6 – DERIVED TABLES

DATABASE: Access_views
TABLENAME(S): DW_LSTG_ITEM
DW_COUNTRIES

REQUIREMENT: Under the principle of “Summarize first, then join”, use a derived
table to summ by country id the count of DISTINCT sellers,
then display the country name and the count.

COLUMN(S): CNTRY_DESC TITLE ‘Seller Country’


SLR_ID
CNTRY_ID (DW_COUNTRIES)
SLR_CNTRY_ID (DW_LSTG_ITEM)
AGGREGATIONS: COUNT(DISTINCT SLR_ID) TITLE ‘Seller Count’

FILTER(S): AUCT_END_DT = ‘2006-06-05’

JOIN CONDITIONS: DW_COUNTRIES.CNTRY_ID = (derived table).SLR_CNTRY_ID


TM

eBay Inc. Proprietary & Confidential 44


EXERCISE 6 - EXAMPLE

SELECT DW_COUNTRIES.CNTRY_DESC (TITLE ‘Seller Country’ )


,X.Sellers (TITLE ‘Seller Count’)
FROM (SELECT SLR_CNTRY_ID
, COUNT(DISTINCT SLR_ID) AS Sellers
FROM dw_lstg_item
WHERE ( dw_lstg_item.AUCT_END_DT = '2006-06-05' )
GROUP BY 1
) AS X
, DW_COUNTRIES
WHERE X.SLR_CNTRY_ID = DW_COUNTRIES.CNTRY_ID
ORDER BY 2 DESC
;

TM

eBay Inc. Proprietary & Confidential 45


Set Operations

Set operations do not perform row level filtering. The SQL set operators manipulate the
results sets of two or more queries by combining the results of each individual query
into a single results set.

Set operations supported by Oracle are:


 INTERSECT Returns result rows that appear in all answer sets generated by the
individual SELECT statements.
 MINUS Result is those rows returned by the first SELECT except for those also
selected by the second SELECT.
 UNION Combines the results of two or more SELECT statements. There will be no
repeating rows.
 UNION ALL Combines the results of two or more SELECT statements, the may be
repeating rows.

TM

eBay Inc. Proprietary & Confidential 46


R
E
TABLE A S
U

INTERSECT OPERATOR L
T
TABLE B

SELECT I.ITEM_ID
,I.AUCT_END_DT
,I.AUCT_TYPE_CODE
,C.AUCT_TYPE_DESC
FROM DW_LSTG_ITEM I
INNER JOIN W_AUCTION_CODES C
ON C.AUCT_TYPE_CODE = I.AUCT_TYPE_CODE
INTERSECT
SELECT I.ITEM_ID
,I.AUCT_END_DT
,I.AUCT_TYPE_CODE
,C.AUCT_TYPE_DESC
FROM DW_LSTG_ITEM I
INNER JOIN DW_AUCTION_CODES C
ON C.AUCT_TYPE_CODE = I.AUCT_TYPE_CODE
WHERE C.AUCT_TYPE_CODE = 1

TM

eBay Inc. Proprietary & Confidential 47


R
E
TABLE A S

INTERSECT OPERATOR (continued)


U
L TABLE B
T
S

DW_LSTG_ITEM (all)

INTERSECT
DW_LSTG_ITEM (Auction Type 1)

DW_LSTG_ITEM (Results)

TM

eBay Inc. Proprietary & Confidential 48


TABLE A

RESULTS

MINUS OPERATOR TABLE B TABLE A


MINUS
TABLE B

SELECT I.ITEM_ID
,I.AUCT_END_DT
,I.AUCT_TYPE_CODE
,C.AUCT_TYPE_DESC
FROM DW_LSTG_ITEM I
INNER JOIN DW_AUCTION_CODES C
ON C.AUCT_TYPE_CODE = I.AUCT_TYPE_CODE
MINUS
SELECT I.ITEM_ID
,I.AUCT_END_DT
,I.AUCT_TYPE_CODE
,C.AUCT_TYPE_DESC
FROM DW_LSTG_ITEM I
INNER JOIN DW_AUCTION_CODES C
ON C.AUCT_TYPE_CODE = I.AUCT_TYPE_CODE
WHERE C.AUCT_TYPE_CODE = 1

TM

eBay Inc. Proprietary & Confidential 49


TABLE A

RESULTS

MINUS OPERATOR (continued) TABLE B TABLE A


MINUS
TABLE B

DW_LSTG_ITEM (all)

MINUS
DW_LSTG_ITEM (Auction Type 1)

DW_LSTG_ITEM (Results)

TM

eBay Inc. Proprietary & Confidential 50


TABLE A TABLE B

UNION SET OPERATOR


RESULTS

SELECT I.ITEM_ID
,I.AUCT_END_DT
,I.AUCT_TYPE_CODE
,C.AUCT_TYPE_DESC
FROM DW_LSTG_ITEM I
INNER JOIN DW_AUCTION_CODES C
ON C.AUCT_TYPE_CODE = I.AUCT_TYPE_CODE
WHERE C.AUCT_TYPE_CODE = 1
UNION
SELECT I.ITEM_ID
,I.AUCT_END_DT
,I.AUCT_TYPE_CODE
,C.AUCT_TYPE_DESC
FROM DW_LSTG_ITEM I
INNER JOIN DW_AUCTION_CODES C
ON C.AUCT_TYPE_CODE = I.AUCT_TYPE_CODE
WHERE C.AUCT_TYPE_CODE = 9

TM

eBay Inc. Proprietary & Confidential 51


TABLE A TABLE B

UNION SET OPERATOR (continued) RESULTS

DW_LSTG_ITEM (Auction Type 1) DW_LSTG_ITEM (all)

UNION
DW_LSTG_ITEM (Auction Type 9)

DW_LSTG_ITEM (Results)

TM

eBay Inc. Proprietary & Confidential 52


EXERCISE 7 - SET OPERATIONS

For this exercise you will be given the queries. Perform the following set operations on
each query and note the results.

QUERY 1:
QUERY 2:
SELECT AUCT_END_DT
SELECT AUCT_END_DT
, AUCT_TYPE_CODE
, AUCT_TYPE_CODE
, COUNT(*) , COUNT(*)
FROM DW_LSTG_ITEM
FROM DW_LSTG_ITEM
WHERE AUCT_END_DT = '2006-06-05' WHERE AUCT_END_DT = '2006-06-05'
AND AUCT_TYPE_CODE IN(1,9) AND AUCT_TYPE_CODE IN(1,12)
GROUP BY 1,2;
GROUP BY 1,2;

• INTERSECT
• MINUS
• UNION
• UNION ALL (not discussed, try it and note the difference in the results!)

TM

eBay Inc. Proprietary & Confidential 53


Sampling

• Sampling can produce similar results without having to go through every row in the
table
• Useful for
– Averages – average call duration
– Percent contribution
– Browsing data
• Assumes random and uniform distribution of the data
• 10,000 rows is often a “statistically significant” sample size

TM

eBay Inc. Proprietary & Confidential 54


SAMPLING

select actvty_categ_id ACTVTY_ Sample Sample Actual Actual


,count(*) as SampleCount CATEG_ID Count Pct Count Pct
23 1,119 11.19 209,784 11.20%
,cast(SampleCount as decimal(18,4)) 818 1,100 11.00 205,985 11.00%
/ 10000 * 100 as SamplePct 17 899 8.99 168,326 8.99%
from (select actvty_categ_id 814 626 6.26 117,226 6.26%
751 615 6.15 115,189 6.15%
from access_views.dw_dc_network_actvty 750 526 5.26 98,537 5.26%
where ad_dt = current_date-4 117 211 2.11 39,605 2.11%
sample 10000 18 210 2.10 39,320 2.10%
)x 118 199 1.99 37,208 1.99%
810 166 1.66 31,130 1.66%
group by 1 116 129 1.29 24,117 1.29%
order by 3 desc; 27 113 1.13 21,078 1.13%
More…

select actvty_categ_id
Select *
,count(*) as SampleCount
From dw_dc_network_actvty
,cast(SampleCount as decimal(18,4))
Sample .01
1% sample / 10000 * 100 as SamplePct

from access_views.dw_dc_network_actvty
where ad_dt = current_date-4
What’s incorrect here? group by 1 TM

order by 3 desc
sample 10000;
eBay Inc. Proprietary & Confidential 55
EXERCISE 8 - SAMPLING

DATABASE NAME: Access_views


TABLENAME: DW_USERS

COLUMN(S): FEEDBACK_SCORE

AGGREGATIONS: AVG FEEDBACK_SCORE (of a 1000 users)

REQUIREMENT: Use the SAMPLE function (HINT: you need a derived table) to get
the average feedback score of a sample of 1000 users.

TM

eBay Inc. Proprietary & Confidential 56


EXERCISE 8 - EXAMPLE

SELECT AVG(X.FEEDBACK_SCORE)
FROM (SELECT FEEDBACK_SCORE
FROM DW_USERS
SAMPLE 1000) X;

Why won’t this work??

SELECT AVG(FEEDBACK_SCORE)
FROM DW_USERS
SAMPLE 1000;

TM

eBay Inc. Proprietary & Confidential 57


CASE Statement

EXAMPLE:
SELECT USER_ID
,CASE SUBSTR(EMAIL,POSITION('@' IN EMAIL)+1,CHARACTERS(EMAIL))
WHEN 'aol.com' THEN 'America On Line'
WHEN 'msn.com' THEN 'Evil Empire'
ELSE 'Other'
END AS ISP
,CASE
WHEN EXTRACT(MONTH FROM user_cre_date) IN ( 1, 2, 3) THEN 1
WHEN EXTRACT(MONTH FROM user_cre_date) IN ( 4, 5, 6) THEN 2
WHEN EXTRACT(MONTH FROM user_cre_date) IN ( 7, 8, 9) THEN 3
WHEN EXTRACT(MONTH FROM user_cre_date) IN (10,11,12) THEN 4
END As JoinQuarter
FROM DW_USERS;

Note: This SQL uses the CASE statement in two different ways.
1. Evaluate values of a single expression, looks at the VALUE of the expression in the ‘WHEN’.
2. Evaluate a potentially different expression for each ‘WHEN”.

TM

eBay Inc. Proprietary & Confidential 58


Set Tagging – Minimize Scans
( Grouping and Displaying Data by Common Criteria)

SELECT auct_start_dt
,auct_end_dt
,'Chinese Auction' AuctionType
FROM DW_LSTG_ITEM auct_start_dt auct_end_dt auction_type
WHERE HIGH_BDR_ID = 5651699 1/22/2006 1/29/2006 Chinese Auction
And auct_end_dt >= '2006-01-01'
1/27/2006 2/3/2006 Dutch Auction
And auct_type_code = 1
UNION 2/3/2006 2/10/2006 Other
SELECT auct_start_dt
,auct_end_dt Note: Only the case statement will
,'Dutch Auction' As AuctionType return the “Other” results.
FROM DW_LSTG_ITEM
WHERE HIGH_BDR_ID = 5651699
And auct_end_dt >= '2006-01-01'
And auct_type_code = 2;
SELECT auct_start_dt
,auct_end_dt
Case statement method is more
,CASE auct_type_code
efficient because it requires only
WHEN 1 THEN 'Chinese Auction'
ONE full table scan rather than TWO
WHEN 2 THEN 'Dutch Auction‘
ELSE ‘Other’
END As AuctionType
FROM DW_LSTG_ITEM
WHERE HIGH_BDR_ID = 5651699
TM
AND auct_end_dt >= '2006-01-01'
AND auct_type_code IN(1,2) ;
eBay Inc. Proprietary & Confidential 59
EXERCISE 9 – CASE STATEMENT

DATABASE NAME: Access_Views


TABLE NAME(S): DW_LSTG_ITEM

COLUMN(S):AUCT_END_DT

AGGREGATIONS: COUNT or SUM of SUCCESSFUL LISTINGS


(LSTG_STATUS_ID = 1)
COUNT of TOTAL LISTINGS
BY AUCTION TYPE
1 = Chinese Auction
2 = Dutch Auction
9 = Fixed Price
ELSE “Other”

FILTER(S): AUCT_END_DT = ‘2006-06-05’

REQUIREMENTS: Get a count of the number of successful listings and total listings
for all auctions ending on 2006-06-05.

TM

eBay Inc. Proprietary & Confidential 60


EXERCISE 9 – EXAMPLE

SELECT AUCT_END_DT
,CASE AUCT_TYPE_CODE
WHEN 1 THEN ‘Chinese Auction’
WHEN 2 THEN ‘Dutch Auction’
WHEN 7 THEN ‘Fixed Price’
ELSE ‘Other’
END AS Auction_Type
,SUM(CASE
WHEN LSTG_STATUS_ID = 1
THEN 1
ELSE 0
END) AS SuccessCnt
,COUNT(*) AS TotalCnt
FROM DW_LSTG_ITEM
WHERE AUCT_END_DT = '2006-06-05'
GROUP BY 1,2
;
TM

eBay Inc. Proprietary & Confidential 61


Date Arithmetic

SELECT auct_start_dt
,auct_end_dt
,CASE auct_type_code
WHEN 1 THEN 'Chinese Auction'
WHEN 2 THEN 'Dutch Auction'
ELSE 'Other'
END As AuctionType
,auct_end_dt - auct_start_dt AS AuctionLengthDays
FROM DW_LSTG_ITEM
WHERE HIGH_BDR_ID = 5651699
And auct_end_dt >= '2006-01-01'

AUCT_START_DT AUCT_END_DT AuctionType AuctionLengthDays

1/21/2006 1/28/2006 Chinese Auction 7

10/25/2005 2/21/2006 Other 119

2/21/2006 2/28/2006 Chinese Auction 7

2/20/2006 2/21/2006 Chinese Auction 1

TM

eBay Inc. Proprietary & Confidential 62


Date Arithmetic continued

Show the date in 3 years:


SELECT CURRENT_DATE + INTERVAL '3' YEAR;
2007-01-20

Show the date 3 months ago:


SELECT CURRENT_DATE + INTERVAL -'3' MONTH;
2003-09-20

Show the date 2 days ago:


SELECT CURRENT_DATE - INTERVAL '2' DAY;
2004-01-18

Add 5 years and 10 months to 2 years and 3 months:


SELECT (INTERVAL '5-10' YEAR TO MONTH) +
(INTERVAL '2-03' YEAR TO MONTH); TM
8-01
eBay Inc. Proprietary & Confidential 63
Extracting Parts of Dates & Time

YEAR
EXTRACT MONTH FROM (DATE)
DAY

Extract the year from Jan. 01, 2004 (date1).


SELECT EXTRACT (YEAR FROM date1) FROM dates;
2004

HOUR
EXTRACT MINUTE FROM (TIME)
SECOND
Extract the hour portion from the time 10:35:40 (time1).
SELECT EXTRACT (HOUR FROM time1) FROM times;
10
TM

eBay Inc. Proprietary & Confidential 64


DATE FUNCTIONS

• ADD_MONTHS
– Adds an integer number (positive or negative) of months to a DATE or
TIMESTAMP expression and normalizes the result.

EXAMPLE:
SELECT ADD_MONTHS(‘2006-03-01’,7);

Returns 2006-10-01

SELECT ADD_MONTHS(‘2006-01-31’,1);

Returns 2006-02-28

SELECT ADD_MONTHS(‘2006-03-01’,-7);

Returns 2005-08-01
TM

eBay Inc. Proprietary & Confidential 65


Exercise 10 – Manipulating Dates

First, get the first day of the current month.


HINT: Use the EXTRACT (DAY) function!

Next, find the first day of NEXT month.


HINT: You may want to use the prior example as a starting point!

TM

eBay Inc. Proprietary & Confidential 66


EXERCISE 10 – Date Manipulation

To get the 1st day of this month:

SELECT CURRENT_DATE - EXTRACT(DAY FROM CURRENT_DATE) + 1;

(You can use this for ANY DATE column in the data warehouse! It can save on joins to
the various CALENDAR tables.)

SELECT ADD_MONTHS(CURRENT_DATE - EXTRACT(DAY FROM


CURRENT_DATE) + 1,1)

TM

eBay Inc. Proprietary & Confidential 67


Statistical Functions


• BASIC STATISTICAL FUNCTIONS
– SUM (BusObj: PERCENT SUM)
– COUNT (BusObj: COUNT ALL)
– AVG
– MIN
– MAX
• ADVANCED STATISTICAL FUNCTIONS
– CORR, COVAR POP, COVAR SAMP
– KURTOSIS
– REGR_AVGX, REGR_AVGY, REGR_COUNT
– SKEW
– STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP

TM

eBay Inc. Proprietary & Confidential 68


BASIC STATISTICAL FUNCTIONS - SUM

Returns a column value that is the arithmetic sum of a specified column in a result table.

SELECT SUM(qty_sold)
FROM dw_lstg_item;

• The SUM function used with a GROUP BY can be useful for determining the
summary of a column grouped by another column.

SELECT auct_type_code
,SUM(qty_sold)
FROM dw_lstg_item
GROUP BY auct_type_code;

TM

eBay Inc. Proprietary & Confidential 69


BASIC STATISTICAL FUNCTIONS - COUNT

• Returns a column value that is the number of qualified rows in a result tab. The example below
returns a count of every row in the dw_lstg_item table.

SELECT COUNT(*)
FROM dw_lstg_item;

• Adding the operator DISTINCT to the COUNT function allows the counting of unique values. The
example below returns a count of the number of unique seller ids in the dw_lstg_item table.

SELECT COUNT(DISTINCT seller_id)


FROM dw_lstg_item;

• The COUNT function used with a GROUP BY can be useful for determining the number of rows
grouped by another column.

SELECT auct_type_code
,COUNT(*)
FROM dw_lstg_item
GROUP BY auct_type_code;
TM

eBay Inc. Proprietary & Confidential 70


BASIC STATISTICAL FUNCTIONS - AVG

• Returns a column value that is the arithmetic average of a specified column in a result
table. The query below returns an average of the bold_fee_usd from the
dw_lstg_item_rev table.

SELECT AVG(bold_fee_usd)
FROM dw_lstg_item_rev;

• The AVG function used with a group by can be useful for determining the average of
value, grouped by another column. The query below gets the daily average of the
bold_fee_usd from the dw_lstg_item_rev table.

SELECT auct_end_dt
,AVG(bold_fee_usd)
FROM dw_lstg_item_rev
GROUP BY auct_end_dt;

TM

eBay Inc. Proprietary & Confidential 71


BASIC STATISTICAL FUNCTIONS – MIN & MAX

• Returns a column value that is the minimum value in column_name in a result table.
The query below returns the earliest auct_end_dt value from the dw_lstg_item table.

SELECT MIN(auct_end_dt)
FROM dw_lstg_item;

• Returns a column value that is the maximum value in column_name in a result table.
The query below returns the maximum auct_end_dt value from the dw_lstg_item
table.

SELECT MAX(auct_end_dt)
FROM dw_lstg_item;

TM

eBay Inc. Proprietary & Confidential 72


ADVANCED STATISTICAL FUNCTIONS –
CORR, COVAR POP, COVAR SAMP

• CORR
– Returns the Pearson product moment correlation coefficient of its arguments for
all non-null data point pairs.
– The Pearson product-moment correlation coefficient is a measure of the linear
association between variables. The boundary on the computed coefficient
ranges from -1.00 to +1.00.
– Note that high correlation does not imply a causal relationship between the
variables.
• COVAR POP
– Returns the population covariance of its arguments for all non-null data point
pairs.
– Covariance measures whether or not two random variables vary in the same
way. It is the average of the products of deviations for each non-null data point
pair.
• COVAR SAMP
– Returns the sample covariance of its arguments for all non-null data point pairs.
– Covariance measures whether or not two random variables vary in the same
way. It is the sum of the products of deviations for each non-null data point pair.

TM

eBay Inc. Proprietary & Confidential 73


Questions ?

eBay Inc. Proprietary & Confidential

You might also like