https://sirupsen.com/napkin/problem-8
updated_at
is in a db, so we also might need to think about loading pages,
iterating over them, etc.
Assuming they are on similar computers, I don’t see why the server or client would be faster than the other.
Say 200 bytes per row. Pages are 16 KB.
Most of the time is spent loading the pages, not hashing them.
This would require more disk and page reads if the table was
not index on user_id
.
Okay, maybe the client and server have similar computers,
but if the client is user_id = 1
then:
That could easily be in the the client’s memory.
Worst case, we need a random access per page:
Total: 62.5 + 400 us (to hash) = 450 us
An index on user_id
is important, but I basically
assumed in (a) that it already was.
At first I thought maybe we could use updated_at
to filter
the rows we need to hash, but user 1 could have updated any
of the rows.
The total dataset is only 20mb, which is reasonable to send to both client and server.
But how do we merge the rows? Suppose we went offline at 0:
-- initial
id | user_id | desc1 | desc2 | updated_at
1 | 1 | hello | world | 100
-- client
id | user_id | desc1 | desc2 | updated_at
1 | 1 | hello | WORLD | 100
-- server
id | user_id | desc1 | desc2 | updated_at
1 | 1 | HELLO | world | 200
If all we had were the final states, the best we could do would be to take the latest value. Without an additional data structure tracking the changes, we don’t know what the original values were.
For all the server knows, the old value of desc2
was WORLD
and
the update at 200 changed WORLD -> world
.
Given that there were only two updates, a client at 100 and the server at 200, we know the merged dataset should be:
id | user_id | desc1 | desc2 | updated_at
1 | 1 | HELLO | WORLD | 200
Therefore, we know that we need to store each update, what was changed, and when it happened. Then apply that sequence during the merge.
Something like this:
{
"id": 1,
"desc2": "WORLD",
"updated_at": 100
},
{
"id": 1,
"desc1": "HELLO",
"updated_at": 200
}
We can sort these by updated_at
and apply them in order.
In the last example there’s a pretty clear “right” way to merge the rows. But there is no universal algorithm for merging: it depends on the programmer to decide what’s right for the application.
For example, let’s say the client runs two updates, and the server runs one:
-- client
id | user_id | desc1 | desc2 | updated_at
1 | 1 | hello | WORLD | 100
1 | 1 | hello | World | 200
-- server
id | user_id | desc1 | desc2 | updated_at
1 | 1 | hello | worlD | 300
The application needs to make a decision on
how to merge the changes. But changed the original
value world
. And the client changed it first!
But the server made the latest change. However,
the client made the last “version” change.
In text fields like desc2
, you might often want to
combine them into a final result like WorlD
.
For merging text fields specifically, check out:
https://timmastny.com/rga/
Detecting the conflict is the first step, the application also needs to decide how to resolve it.
Funny enough, the revision table from problem 7 provides a nice way to store all the updates if we wanted to use a state-based CRDT.
https://sirupsen.com/napkin/problem-9
Estimates:
256 bytes per row, 25 MB of data
Assume everything is on disk
Hash:
updated_at
field,
8 bytes * 10^5 rows = 8 * 10^5 bytes = 0.8 MB, or about 400 usReading from db:
This is way faster than mine, what did I miss?
user_id
. If it’s not, we’ll need to do
a full table scan.when the query planner knows that it must do a full table scan, it won’t use any index, because doing “sequential” random reads while iterating through leaf nodes on a B+ tree is slower than doing all random reads at once.
the idea is that the B+ tree pointer to the next page is not known ahead of time: but the for a full table scan, we don’t need to go in any order, and can request all pages from the SSD at once.
And that’s what he meant by “pre-fetching”:
Syncs per second:
Client:
Idea: hash (user_id, updated_at)
user_id
and
querying updated_at
, the index pages could be much more
dense can we could process fewer pages per disk!
Because each B+ tree record would be 8-16 bytes rather than ~200.
Merkle Trees: what do I know about them before reading the solution?
I think they work something like this:
The idea is that each of you can compute and compare hABCD
.
If they differ, then you compare hAB
and hCD
. Say
hAB
match: then you know those rows are the same.
If hCD
differs, then at last you compare hC
and hD
to find out which row is different.
What does the implementation look like?
updated_at
and all
the parents nodes of the tree.From the solution:
Send the tree
Send part of the tree
Sync