← Back to Napkin Math

Problem 5

https://sirupsen.com/napkin/problem-5

Attempt

q1

What is the performance of the query in the worst-case, where we load only one product per page?

Ref:

For 100 pages:

q2

(2) What is the worst-case performance of the query when the pages are all in memory cache (typically that would happen after (1))?

Total: 165 us

We would also need to add this time to the answer in q1. But the disk read is so much slower, we can basically ignore the in-memory scan.

q3

If we changed the primary key to be (shop_id, id), what would the performance be when (3a) going to disk, and (3b) hitting cache?

The overall disk and memory load and scan times would be the same. What changes is how many pages we need to read.

We are given that we have an index of the form (shop_id, id), so the individual pages will be sorted by shop_id first, then by id:

shop_id | id
1       | 1
1       | 2
1       | 4
2       | 1
2       | 10
2       | 13

Likewise, the B+ tree index will quickly give us the first page containing shop_id = 13. And in B+ trees, leaf nodes are linked together, so we can iterate through the page entries, and then go on to the next pages until we find 100 records.

pages

The diagram implies 8 records per page, but that does not seem realistic. If the page only stored

That would be

Let’s say for some reason it’s not just an index page, but also has other data. If we had 10 columns at 8 bytes each, that’s still about 200 records per page.

So either way, we only need to look at 1 page.

If it was 8 records per page, then we would need to load 13 pages.

q3a

Random 8KB SSD:

q3b

random access:

sequential scan:

Solution

https://sirupsen.com/napkin/problem-6

Their sequential scan estimate is:

We said:

Which was a little slower, but in the right ballpark. I like my estimate, only because 1 MB / 100 us is easy to remember.

Lastly, they make a reasonable estimate that the each record is 128 bytes:

So I’m glad I estimated that and thought that through!