Elementary School Addition Algorithm
The elementary school addition algorithm works for binary numbers 1:
1 1 1 1 1 (carried digits)
0 1 1 0 1
+ 1 0 1 1 1
-------------
= 1 0 0 1 0 0 = 36
It works in binary (or any base) because the sum of any 3 single-digit numbers is at most two digits.
The time complexity is O(n): we need to result of the previous
iteration before we can correctly compute the
sum of the current iteration2.
Hardware Complexity
In certain software-based model of computation, adding can never
be faster than O(n) because we need to read each pair of bits
sequentially.
Hardware provides an alternative model with natural parallelism.3
The value of each bit in an n-bit number is stored on a wire.
Wires connect to circuits gates which transform the input to
zero or one. All wires are transformed simultaneously,
not one at a time.
For example, let’s compute the bitwise AND of two n-bit numbers a and b
by connecting their bit-wires to an AND gate:
To calculate the time complexity of this operation, we look at the
height of the circuit diagram. In this case, the height is
independent of n, the number of bits.
Therefore, bit-wise AND can be calculated in O(1) constant time.
The trade-off is the space complexity, analogous to memory complexity
in the software model. The space complexity is proportional to the
number of gates used. We need an AND gate for each bit, so
the space complexity is O(n).
Elementary School Adder
If we implement the elementary school adder in hardware,
the time complexity is still O(n). The parallelism doesn’t
help, because each circuit has to the wait for the previous
carry digit:
You can also think about it in terms of height like the last
example, where the top is the first bits \(a_0\) and \(b_0\)
and the bottom is \(s_3\). The height grows at O(n).
Carry-lookahead Adder
We can improve the algorithm by looking at the states that produce a carry.
If a AND b, then there will be a carry. If a OR b and there is a carry from
the last iteration, then there will be a new carry. These are called
generating and propagating carries respectively. We can summarize this
with symbols as follows:
\[ \begin{align*} c_{i + 1} &= a_i b_i + (a_i + b_i) c_i \\ c_{i + 1} &= g_i + p_i c_i \end{align*} \]
Let’s work out the 4-bit carries:
\[ \begin{align*} c_{1} &= g_0 + p_0 c_0 \\ c_{2} &= g_1 + p_1 c_1 \\ c_{3} &= g_2 + p_2 c_2 \\ c_{4} &= g_3 + p_3 c_3 \\ \end{align*} \]
At first, this doesn’t seem like an improvement, because each carry depends on the previous carry. But let’s substitute carries into \(c_4\):
\[ \begin{align*} c_{4} &= g_3 + p_3(g_2 + p_2 (g_1 + p_1 (g_0 + p_0 c_0))) \\ c_{4} &= g_3 + p_3g_2 + p_3 p_2 g_1 + p_3 p_2 p_1 g_0 + p_3 p_2 p_1 p_0 c_0 \end{align*} \]
The longest product term is \(p_3 p_2 p_1 p_0 c_0\),
with length proportional to n. Calculating this with
successive AND operations is still O(n). But we can
divide-and-conquer to calculate this in O(log n). For example:
Recall that the height of the circuit is the time complexity.
This circuit forms a binary tree (upside down),
which is known to have height O(log n).
So a long chain of dependencies can be
reorganized as a balanced tree, changing the time complexity from
O(n) to O(log n). And since we can compute all the carries in parallel,
the time complexity for the sum also becomes O(log n).
Space Complexity
Where n is the number of bits being added, we can count the total gates
of the binary tree circuit as a geometric series:
\[ 1 + 2 + 4 + \cdots + \frac{n}{2} + n = 2n - 1 \]
So the overall space complexity is O(n)4.
Real-world Implementation
Carry-lookahead adders can also be constructed as intermediate blocks and chained together. For example, Harris and Harris describe eight 4-bit carry-lookahead blocks chained together to compute a 32-bit sum.
This design is a nice compromise, saving gates by avoiding one full-width 32-bit carry-lookahead network5, while still achieving some speedup from the 4-bit carry-lookahead in each block.
Example: https://en.wikipedia.org/wiki/Binary_number#Addition↩︎
Also known as the ripple carry adder.↩︎
Thanks for Robert Ehrlich for the correction↩︎
Which can still be practical!↩︎