Real World Asymptotic Complexity

$$ \def\hl\{\bbox[yellow]} \def\constant\{\enclose{circle}[mathcolor="blue"]} \def\term\{\enclose{circle}[mathcolor="green"]} \def\blueci\{\enclose{circle}[mathcolor="blue"]} \def\greenci\{\enclose{circle}[mathcolor="green"]} \def\orangeci\{\enclose{circle}[mathcolor="orange"]} \def\U\{\mathbb{U}} \def\N\{\mathbb{N}} \def\Z\{\mathbb{Z}} \def\Q\{\mathbb{Q}} \def\R\{\mathbb{R}} \newcommand{\green}[1]{\textcolor{green}{#1}} \newcommand{\blue}[1]{\textcolor{blue}{#1}} \newcommand{\floor}[1]{\lfloor #1 \rfloor} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\card}[1]{\lvert #1 \rvert} $$

Without a doubt, asymptotic analysis is the most important tool in any algorithm designer’s toolbox. It provides a conceptual framework for divining the runtime of an algorithm without executing it. Many of the field’s advancements would not have been possible without it. Regardless, please heed these words of caution: anything taken to an extreme has deleterious effects. It’s easy to obsess over asymptotic complexity (and performance by extension) as it is the focal point of the study of algorithms. However, the very core of software engineering is managing trade-offs. Overweighting a single variable is to be naive of reality. Asymptotic complexity is a data point for accessing the utility of an algorithm, it is not the whole story. The purpose of this page is to provide the reader with a wholistic view of algorithmic analysis.

Principal of Parsimony

In other scientific domains, the concept of simplicity is a given. Albert Einstein is quoted as saying, “everything should be made as simple as possible, but not simpler.1” The sciences even have a term for the concept: the principal of parsimony (aka law of parsimony or Occam’s razor). Roughly defined, it is “the principle that the most acceptable explanation of an occurrence, phenomenon, or event is the simplest, involving the fewest entities, assumptions, or changes.”2 For algorithms, this means the best implementation is the simplest code, that meets all business needs, involving the fewest entities3.

The principal of parsimony implies that a software professional’s first and foremost responsibility is to deliver the simplest possible solution that meets the business need. Sometimes meeting business need entails squeezing out every last ounce of performance. However, that is not typically the case. Pragmatism is the key to writing successful software. For those who doubt the veracity of this claim, consider this. Donald Knuth literally wrote the book on algorithmic analysis4 and there is no greater authority on the subject. The following quote comes directly from his writings5.

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.(emphasis added) - Donald Knuth

Much “blood and ink have been spilt” on the merits of software simplicity. A full treatment of the subject is well beyond the scope of this work. It’s sufficient to say, if you must obsess over a software metric, the most appropriate metric is pragmatism. The result of pragmatism is the tenacious pursuit for value. Value is roughly defined in the equation below.

\[\text{value} = \frac{\text{utility}}{\text{complexity}}\]

The more complex a solution, the lower its value. To firm this up, consider a fictitious example scenario. Suppose you are tasked with creating an algorithm capable of identifying the two closest objects in a two-dimensional plane. The algorithm below, which has a quadratic runtime ($\Theta(n^2)$), is simple and effective.

\begin{algorithm}
\caption{Closest Pair - Two Dimensions - Brute Force}
\begin{algorithmic}
\REQUIRE $A$ = set of 2-tuples representing $x$ and $y$ coordinates on a plane having $\vert A \vert \geq 2$
\OUTPUT the distance between the two closest points as measured by Euclidean distance
\FUNCTION{closestPairBruteForce}{A}
    \STATE closest-dist $\gets$ \CALL{distance}{$A_0$, $A_1$}
    \STATE
    \FOR{$p1 \in A$}
        \FOR{$p2 \in A$}
            \IF{$p1 \neq p2$ and \CALL{distance}{$p1$, $p2$} $\lt$ closest-dist}
                \STATE closest-dist $\gets$ \CALL{distance}{$p1$, $p2$}
            \ENDIF
        \ENDFOR
    \ENDFOR
    \STATE
    \RETURN closest-dist
\ENDFUNCTION
\STATE
\REQUIRE $p1, p2$ = 2-tuples representing x and y coordinates
\OUTPUT Euclidean distance between p1 and p2
\FUNCTION{distance}{$p1$, $p2$}
    \RETURN $\sqrt{(p2_x - p1_x)^2 + (p2_y - p1_y)^2}$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Is there a more efficient solution? The answer is yes! The algorithm below is “linearithmic”6 ($\Theta(n \log_2{n})$). Don’t get distracted trying to thoroughly understand it as it is explained on a different page. However, take a few minutes to appreciate the complexity.

\begin{algorithm}
\caption{Closest Points}
\begin{algorithmic}
\REQUIRE A = set of 2-tuples representing x and y coordinates on a plane having $\vert A \vert \geq 2$
\OUTPUT the distance between the two closest points as measured by Euclidean distance
\FUNCTION{closestPair}{A}
    \STATE Ax $\gets$ A sorted by the x coordinate
    \STATE Ay $\gets$ A sorted by the y coordinate
    \RETURN \CALL{closestPairRecursive}{$Ax$, $Ay$}
\ENDFUNCTION
\STATE
\REQUIRE Ax = set of 2-tuples representing x and y coordinates sorted by the x corrdinate
\REQUIRE Ay = set of 2-tuples representing x and y coordinates sorted by the y corrdinate
\OUTPUT the distance between the two closest points as measured by Euclidean distance
\FUNCTION{closestPairRecursive}{Ax, Ay}
    \STATE $n \gets \vert Ax \vert$
    \STATE
    \IF{n $\leq$ 3}
        \RETURN \CALL{closestPairBruteForce}{Ax}
    \ENDIF
    \STATE
    \STATE \COMMENT{Divide the points down the middle of the x axis}
    \STATE mid $\gets \floor{\frac{n}{2}}$
    \STATE left-cut $\gets$ $\{Ax_0...Ax_{\text{mid} - 1}\}$
    \STATE right-cut $\gets$ $\{Ax_{\text{mid}}...Ax_{n-1}\}$
    \STATE
    \STATE \COMMENT{This x value is the dividing point}
    \STATE divide $\gets Ax_{\text{mid -1}}.x$
    \STATE
    \STATE \COMMENT{Split the y-sorted points into halves that match left-cut and right-cut}
    \STATE \COMMENT{This eliminates the need for an additonal $O(n \log_2{n})$ sort operation}
    \STATE left-y $\gets \emptyset$
    \STATE right-y $\gets \emptyset$
    \FOR{$i \in Ay$}
        \IF{$i.x \leq$ divide}
            \STATE left-y $\gets \text{left-y} \cup \{i\}$
        \ELSE
            \STATE right-y $\gets \text{right-y} \cup \{i\}$
        \ENDIF
    \ENDFOR
    \STATE
    \STATE left-closest-dist $\gets$ \CALL{closestPairRecursive}{left-cut, left-y}
    \STATE right-closest-dist $\gets$ \CALL{closestPairRecursive}{right-cut, right-y}
    \STATE
    \STATE closest-dist $\gets$ \CALL{min}{left-closest-dist, right-closest-dist}
    \STATE
    \STATE \COMMENT{Create the strip}
    \STATE strip $\gets \emptyset$
    \FOR{$i \in Ay$}
        \STATE \COMMENT{$\vert \vert$ denotes absolute value in this instance}
        \IF{$\vert i.x - \text{divide} \vert \lt$ closest-dist}
            \STATE strip $\gets \text{strip} \cup \{i\}$
        \ENDIF
    \ENDFOR
    \STATE
    \STATE \COMMENT{Compare each point in the strip to the points within closest-dist below it}
    \FOR{$i \gets 0$ to $\vert\text{strip}\vert$}
        \STATE $j \gets i + 1$
        \STATE \COMMENT{This while loop will never execute more than 6 times}
        \WHILE{$j < \vert\text{strip}\vert$ and $(\text{strip}_{j}.y - \text{strip}_i.y) \lt$ closest-dist}
            \STATE dist $\gets$ \CALL{distance}{$\text{strip}_i, \text{strip}_j$}
            \IF{dist $\lt$ closest-dist}
                \STATE closest-dist $\gets$ dist
            \ENDIF
            \STATE $j \gets j + 1$
        \ENDWHILE
    \ENDFOR
    \STATE
    \STATE return closest-dist
\ENDFUNCTION
\STATE
\REQUIRE $p1, p2$ = 2-tuples representing x and y coordinates
\OUTPUT Euclidean distance between p1 and p2
\FUNCTION{distance}{$p1$, $p2$}
    \RETURN $\sqrt{(p2_x - p1_x)^2 + (p2_y - p1_y)^2}$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Comparing the algorithms, it’s not immediately obvious but Algorithm 2 is actually more performant. However, it has considerably higher implementation and maintenance costs while Algorithm 1 is more malleable. Furthermore, Algorithm 2 is close to incomprehensible without commentary. The only reasonable justification for using Algorithm 2 is in the case where the performance boost provides a competitive advantage that offsets the complexity costs. The performance deficit is likely unnoticeable with meagerly sized inputs or rare invocations. It is advisable to start with the simplest algorithm and only upgrade in the event of a noticeable efficiency shortfall.

Key Takeaways

  • The principal of parsimony implies that a software professional’s first and foremost responsibility is to deliver the simplest possible solution that meets the business need.
  • Pragmatism is a crucial component of designing and writing successful software.
  • $\text{value} = \frac{\text{utility}}{\text{complexity}}$
  • The best strategy is to start with the simplest implementation and upgrade only in the event of a noticeable performance deficit.

Constants Matter!

The Asymptotic Analysis section made it clear that growth rate is the most important runtime component. While this is true, that doesn’t mean that the constants and lower order terms are wholly irrelevant. To demonstrate, consider the case of matrix multiplication. Recognize that it is an extreme example; regardless, it demonstrates the concept beautifully.

The straightforward algorithm, shown below, is $\Theta(n^3)$. Again, do not attempt to master the algorithms on this page just yet as they receive full treatment later on. Take a few minutes to appreciate the complexity or rather the lack thereof.

\begin{algorithm}
\caption{Naive Matrix Multiplication}
\begin{algorithmic}
\REQUIRE $n$ = size of the matrices
\REQUIRE $A,B$ = $n \times n$ matrices
\OUTPUT $A * B$
\FUNCTION{naiveMatrixMultiply}{$n,A,B$}
	\STATE $C \gets n \times n$ matrix
    \FOR{$i \gets 0$ to $n-1$}
		\FOR{$j \gets 0$ to $n-1$}
            \FOR{$k \gets 0$ to $n-1$}
				\STATE $C_{i,j} = C_{i,j} + A_{i,k} * B_{k,j}$
			\ENDFOR
		\ENDFOR
	\ENDFOR
	\RETURN $C$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Now consider Strassen’s matrix multiplication algorithm as shown below which has a runtime of $\Theta(n^{2.8})$. Contemplating asymptotic complexity in a vacuum would lead one to assume that this is a superior algorithm in all cases. However, the constants hidden by the notation are unusually high. Even with a carefully tuned implementation, a performance gain isn’t obvious with matrices smaller than $1024 \times 1024$.

\begin{algorithm}
\caption{Strassen Matrix Multiplication}
\begin{algorithmic}
\REQUIRE $n$ = size of the matrices where $n$ is a power of 2 (4, 8, 16, 32,...)
\REQUIRE $A,B$ = $n \times n$ matrices
\OUTPUT $A * B$
\FUNCTION{strassenMatrixMultiply}{$n,A,B$}
    \IF{$n \leq 16$} \COMMENT{tune this value for your platform}
        \RETURN \CALL{naiveMatrixMultiplication}{n, A, B}
    \ENDIF
    \STATE \COMMENT{Split the matrices into sub-matrices that are $\frac{1}{4}$ the size of the original}
    \STATE $\begin{bmatrix} A_1 & A_2 \\ A_3 & A_4 \end{bmatrix} \gets \text{submatrices of} \; A$
    \STATE $\begin{bmatrix} B_1 & B_2 \\ B_3 & B_4 \end{bmatrix} \gets \text{submatrices of} \; B$
    \STATE
    \STATE $P_1 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_1, B_2 - B_4$}
    \STATE $P_2 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_1 + A_2, B_4$}
    \STATE $P_3 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_3 + A_4, B_1$}
    \STATE $P_4 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_4, B_3 - B_1$}
    \STATE $P_5 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_1 + A_4, B_1 + B_4$}
    \STATE $P_6 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_2 - A_4, B_3 + B_4$}
    \STATE $P_7 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_1 - A_3, B_1 + B_2$}
    \RETURN $\begin{bmatrix} P_5+P_4−P_2+P_6 & P_1+P_2 \\ P_3 + P_4 & P_1+P_5-P_3−P_7 \end{bmatrix}$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Strassen’s algorithm is useful in practice; unfortunately, the same cannot be said for all algorithms. For instance, the Coppersmith–Winograd matrix multiplication algorithm has a runtime of $O(2^{2.374})$; however, it has no practical purpose. The constants are so massive that there is no known terrestrial dataset large enough to take advantage of the improved asymptotic runtime. Algorithms such as this are known as galactic algorithms7.

Key Takeaways

  • In practice, the constants hidden by asymptotic notation can be large enough to render algorithms with higher complexity preferable.

Context Matters

The underlying message of the previous sections is that context matters. There are no absolutes. The best alternative is rarely clear and it’s always different depending on the situation in which it is applied. Asymptotic analysis typically addresses the general case. It’s up to you, as a software professional, to evaluate your specific situation. Below are a few considerations above and beyond algorithmic complexity.

  • It may be possible to manipulate your data to take advantage of a best-case runtime ($\Omega$) rather than the worse-case runtime ($O$).
  • It may be possible to parallelize computation thus raising the total number of requisite machine cycles while lowering the time as measured by a clock.
  • Memory consumption may be more important than runtime in some contexts.
  • Machine architecture can have a huge impact on actual runtime.

Above all, understand that the intricacies that comprise runtime as measured by a clock are staggering. More often than not, microseconds just don’t matter. However, when they are at a premium the only responsible course of action is benchmarking. Test several algorithm variations on representative hardware and data, adjust them, and test again. This seems painfully obvious but it’s all too common for software professionals to neglect the practice. Just to be absolutely clear, never attempt a performance optimization without repeatable benchmarks.

To sum this up, the algorithm with the lowest algorithmic complexity may not be the most performant in a particular application. It is an important data point; however, there are many more to consider.

Key Takeaways

  • There is no best solution. Software professionals have the responsibility to find the solution that best conforms to their context.
  • There are other ways to improve performance beyond reducing asymptotic complexity.
  • Never attempt any performance optimization without clear benchmarks on representative hardware.
  1. A quick Google search revels many sources that attribute this quote to Albert Einstein. The original source of the quote is ambiguous as there are conflicting accounts. 

  2. Oxford Reference: https://www.oxfordreference.com/view/10.1093/oi/authority.20110803100346221 

  3. Technically, the principle of parsimony is a law for describing natural phenomenon and software does not materialize naturally. In this context, it is prescriptive rather than descriptive. 

  4. The Art of Computer Programming (TAOCP) by Donald Knuth is the most influential computer science work ever created. It shaped the industry. It’s next to impossible to find an algorithm textbook that doesn’t reference it. American Scientist listed TAOCP as one of the best twelve physical-science monographs of the century.

    Below is a listing of the lastest volumes of TAOCP as of the time of this writing.

  5. From Donald Knuth’s 1976 paper entitled Structured Programming With goto Statements. It was printed in the book Classics in Software Engineering in 1979. It is available on Amazon using the link below. https://www.amazon.com/dp/0917072146/ref=cm_sw_r_tw_dp_U_x_nZXTEb7NCRCTP 

  6. Linearithmic is a portmanteau of linear and logarithmic meaning $n \log{n}$. Although it violates the “no constants and lower order terms” rule, it’s often useful in practice. The lesson here is that it’s perfectly acceptable to bend the rules for the sake of clarity. 

  7. The name galactic algorithm originates from Richard Lipton and Ken Regan. Due to the highly theoretical nature of the topic, this is it’s only mention on this site. Interested readers can consult the following link for more information: https://rjlipton.wordpress.com/2010/10/23/galactic-algorithms/