https://sirupsen.com/napkin/problem-5
What is the performance of the query in the worst-case, where we load only one product per page?
Ref:
For 100 pages:
(2) What is the worst-case performance of the query when the pages are all in memory cache (typically that would happen after (1))?
Total: 165 us
We would also need to add this time to the answer in q1. But the disk read is so much slower, we can basically ignore the in-memory scan.
If we changed the primary key to be (shop_id, id), what would the performance be when (3a) going to disk, and (3b) hitting cache?
The overall disk and memory load and scan times would be the same. What changes is how many pages we need to read.
We are given that we have an index of the form (shop_id, id)
,
so the individual pages will be sorted by shop_id
first, then by id
:
shop_id | id
1 | 1
1 | 2
1 | 4
2 | 1
2 | 10
2 | 13
Likewise, the B+ tree index will quickly give us the
first page containing shop_id = 13
. And in B+ trees,
leaf nodes are linked together, so we can iterate through the
page entries, and then go on to the next pages until
we find 100 records.
The diagram implies 8 records per page, but that does not seem realistic. If the page only stored
That would be
Let’s say for some reason it’s not just an index page, but also has other data. If we had 10 columns at 8 bytes each, that’s still about 200 records per page.
So either way, we only need to look at 1 page.
If it was 8 records per page, then we would need to load 13 pages.
Random 8KB SSD:
random access:
sequential scan:
https://sirupsen.com/napkin/problem-6
Their sequential scan estimate is:
We said:
Which was a little slower, but in the right ballpark. I like my estimate, only because 1 MB / 100 us is easy to remember.
Lastly, they make a reasonable estimate that the each record is 128 bytes:
So I’m glad I estimated that and thought that through!