Professional Documents
Culture Documents
mpp333
Database Assignment 4
Quest 1
Ques 2.)
Ques 3.)
3a.
(i). Output the pid of any product bought by person named Richard Reindeer.
(ii). Output the mid of any member who lives in Portland, but has bought something in a
branch located in Austin.
(iii) Output the mid of any member who has bought something in a branch that is outside the
state where the member lives.
3b.
(i). Query Plan for the query is:
The tables does not fit into the main memory. So, we scan the table Member to fetch the tuple
for Richard Reindeer. The size of this table is 2GB, so it will take around
(2000MB/100MB/s)= 20 seconds to scan this table.
Clearly the result which is a single tuple, fits in main memory, and we can then do a blocked
nested loop join that only scans the Transaction table once.
This table is 40GB, so it takes (40000 MB/100 MB/s)= 400s to scan it, and gives us 10
tuples, that clearly also fit into main memory.
Then we read Buy once to join these tuples with the Buy tuples. The Buy table is 400GB, so
this takes (400000 MB/100 MB/s) = 4000s to scan. So, overall this query takes roughly
4420s, or a bit more than an hour.
We start by scanning Member and only those members that belong to Portland, which fits
into main memory.
Then, we can join this result with Transactions by scanning it once, leaving us with 0.5% of 1
billion transactions, which we can project down to mid and bid. Then we can scan Branch
once to join with it, and while scanning it we throw away tuples about branches not in Austin.
Therefore, the cost is that of scanning each table once, which is about 400s
(40000MB/100MB/s) + 20s (2000MB/100MB/s) + 12ms (.04MB/100MB/s) = 420s.
3c.
For the sparse index on mid in Transaction, each index entry is of size (8+8) = 16 bytes, so
the maximum number of entries in 4 KB is about 250.
With 80% occupancy, we get about 200 entries per node.
This is a sparse index, and there are 100 tuples per page, so we have only one index entry for
every 100 tuples in Transaction.
Thus, we have 10 million index entries located in about 10 million / 200 = 50,000 leaf nodes.
On the next level, we have about 250 nodes, then 2 nodes, then the root. So there are 4 levels
of nodes in the tree, and the size if dominated by the 50,000 leaf nodes of 4KB each (about
200 MB).
The cost of fetching a tuple is the cost of 5 disk accesses to blocks, about 50ms (ignoring the
trivial transfer cost per block).
For the index on pid in Buy, each index entry is also 16 bytes, so 200 entries per block as
before. But now we have 10 billion index entries (one for each tuple in Buy), located in 10
million / 200 = 50 million leaf nodes. On the next level, we have 250,000 nodes, then 1250,
then 7, then the root.
So the depth is now 5 and access cost is 60ms. The index is now of size 4KB * 50 million, or
200GB.