You are on page 1of 8

Mrunal Patekar

mpp333

Database Assignment 4

Quest 1

Ques 2.)

Ques 3.)
3a.
(i). Output the pid of any product bought by person named Richard Reindeer.

(ii). Output the mid of any member who lives in Portland, but has bought something in a
branch located in Austin.

(iii) Output the mid of any member who has bought something in a branch that is outside the
state where the member lives.

3b.
(i). Query Plan for the query is:
The tables does not fit into the main memory. So, we scan the table Member to fetch the tuple
for Richard Reindeer. The size of this table is 2GB, so it will take around
(2000MB/100MB/s)= 20 seconds to scan this table.
Clearly the result which is a single tuple, fits in main memory, and we can then do a blocked
nested loop join that only scans the Transaction table once.
This table is 40GB, so it takes (40000 MB/100 MB/s)= 400s to scan it, and gives us 10
tuples, that clearly also fit into main memory.
Then we read Buy once to join these tuples with the Buy tuples. The Buy table is 400GB, so
this takes (400000 MB/100 MB/s) = 4000s to scan. So, overall this query takes roughly
4420s, or a bit more than an hour.

(ii). Query Plan for the query is:

We start by scanning Member and only those members that belong to Portland, which fits
into main memory.

Then, we can join this result with Transactions by scanning it once, leaving us with 0.5% of 1
billion transactions, which we can project down to mid and bid. Then we can scan Branch
once to join with it, and while scanning it we throw away tuples about branches not in Austin.
Therefore, the cost is that of scanning each table once, which is about 400s
(40000MB/100MB/s) + 20s (2000MB/100MB/s) + 12ms (.04MB/100MB/s) = 420s.

3c.
For the sparse index on mid in Transaction, each index entry is of size (8+8) = 16 bytes, so
the maximum number of entries in 4 KB is about 250.
With 80% occupancy, we get about 200 entries per node.
This is a sparse index, and there are 100 tuples per page, so we have only one index entry for
every 100 tuples in Transaction.
Thus, we have 10 million index entries located in about 10 million / 200 = 50,000 leaf nodes.
On the next level, we have about 250 nodes, then 2 nodes, then the root. So there are 4 levels
of nodes in the tree, and the size if dominated by the 50,000 leaf nodes of 4KB each (about
200 MB).
The cost of fetching a tuple is the cost of 5 disk accesses to blocks, about 50ms (ignoring the
trivial transfer cost per block).
For the index on pid in Buy, each index entry is also 16 bytes, so 200 entries per block as
before. But now we have 10 billion index entries (one for each tuple in Buy), located in 10
million / 200 = 50 million leaf nodes. On the next level, we have 250,000 nodes, then 1250,
then 7, then the root.
So the depth is now 5 and access cost is 60ms. The index is now of size 4KB * 50 million, or
200GB.

Ques 4.a) Given:


RPM: 9600
Platters: 3 double sided
No of tracks= 500,000
No of sectors/track= 1000
Size of sector= 512 bytes
i) Disk capacity= (bytes/sector) * (sectors/track) * (tracks/surface) * (surface/platter) *
(platter/disk)
= 512 * 1000 * 500,000 * 2 * 3
= 1.536 * 10^12= 1.536 GB =approximately 1.5 GB
So the Disk Capacity is 1.536 GB

ii) Max rate at which data can be read from disk:


Time required for a rotation = (1/9600)*(60)
= 0.00625 sec
Bytes/Track = (No. of bytes per sector)*(No. of sectors per track)
= (512)*(1000)
= 512 k
Since one track of data can be transferred per revolution, the data transfer rate is
(512)/(0.00625) = 81920kb/sec
Max rate= 81.92Mb/sec
iii) Average rotational latency: it is half the amount of the time it takes for the disk to make
one revolution.
Time required for a rotation = (1/9600)*(60) = 0.00625sec = 6.25msec
Average rotational latency = 6.25/2 = 3.125msec
Average rotational latency= 3.125 ms

4.b) Average seek time= 4ms


Average rotational latency= 4 ms
Maximum transfer rate= 80 MB/s
Block Model:
a)We have to read a file of size 4Kb
Block transfer= seek time+ rotational latency+ transfer time
Transfer time for 4Kb is as follows:
For 1000 ms, we have 80*1024 Kb. Therefore, for 4 Kb, we get, 4*1000/(80*1024)=
0.048=0.05 sec (approx.)
Therefore, Avg read time = 4ms + 4ms + 0.05ms=8.05 ms
Time to read a file of size 4KB= 8.05 ms
b) File of size 500Kb
500/4= 125 blocks
Therefore, T= 125*8.05= 1006.25 ms
c)Now the file size is 50MB
50MB/4kb= 12500 blocks
Therefore, T= 12500* 8.05= 100625 ms
In block transfer method, we get vast overestimate of the actual time needed
B) Latency Rate transfer Model:
LTR Model = latency+ Transfer time to read a file
LTR= seek time+ rot lat + transfer time
Transfer rate= 80Mb/s

a) For 4KB size,( 4*1000)/(80*1000)= 0.05 ms


T= 4+4+0.05= 8.05 ms
b) File of size 500Kb
Transfer rate for 500 KB= (500* 1000)/(80*1000)= 6.25 ms
T= 4+4+6.25= 14.25 ms
c) File of size 50MB
Transfer rate for 50MB = (50*1000 * 1000)/(80*1000) = 625 ms
T= 4+4+625= 638 ms
Thus, the latency rate transfer model is much more realistic than the Block Transfer model.
4.c
Phase 1:Repeat the following until till all data is read: Read 2GB of data and sort it in
main
memory using any sorting algorithm. Write it into a new file until all data is
read.
The time to read 2GB is 4+4+2048/(80)*1000=25.60s
To read and write 400 such files takes 25.60*2*400=20480s
Phase 2:Merge the 400 files created in Phase 1 in one pass.
The main memory is divided into 401 buffers.
Each buffer is of size 2048/401 =5.10 MB.
For each buffer, the read and write time is 4+4+5.10/80*1000=71.75ms.
800GB can be divided into 800*1024/5.10=160627pieces.
The total time is 71.75*160627*2=23049.9=23050s.
The total time for sorting the 240GB file in a single pass is about 20480+23050=43530s

You might also like