Prerequisites (click to view)
graph LR
ALG(["fas:fatrophy Real Wold Asymptotic Analysis fas:fatrophy #160;"])
ASY_ANA(["fas:facheck Asymptotic Analysis #160;"])
click ASY_ANA "./mathasymptoticanalysis"
ASY_ANA>ALG
Without a doubt, asymptotic analysis is the most important tool in any algorithm designer’s toolbox. It provides a conceptual framework for divining the runtime of an algorithm without executing it. Many of the field’s advancements would not have been possible without it. Regardless, please heed these words of caution: anything taken to an extreme has deleterious effects. It’s easy to obsess over asymptotic complexity (and performance by extension) as it is the focal point of the study of algorithms. However, the very core of software engineering is managing tradeoffs. Overweighting a single variable is to be naive of reality. Asymptotic complexity is a data point for accessing the utility of an algorithm, it is not the whole story. The purpose of this page is to provide the reader with a wholistic view of algorithmic analysis.
Principal of Parsimony
In other scientific domains, the concept of simplicity is a given. Albert Einstein is quoted as saying, “everything should be made as simple as possible, but not simpler.^{1}” The sciences even have a term for the concept: the principal of parsimony (aka law of parsimony or Occam’s razor). Roughly defined, it is “the principle that the most acceptable explanation of an occurrence, phenomenon, or event is the simplest, involving the fewest entities, assumptions, or changes.”^{2} For algorithms, this means the best implementation is the simplest code, that meets all business needs, involving the fewest entities^{3}.
The principal of parsimony implies that a software professional’s first and foremost responsibility is to deliver the simplest possible solution that meets the business need. Sometimes meeting business need entails squeezing out every last ounce of performance. However, that is not typically the case. Pragmatism is the key to writing successful software. For those who doubt the veracity of this claim, consider this. Donald Knuth literally wrote the book on algorithmic analysis^{4} and there is no greater authority on the subject. The following quote comes directly from his writings^{5}.
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.(emphasis added)  Donald Knuth
Much “blood and ink have been spilt” on the merits of software simplicity. A full treatment of the subject is well beyond the scope of this work. It’s sufficient to say, if you must obsess over a software metric, the most appropriate metric is pragmatism. The result of pragmatism is the tenacious pursuit for value. Value is roughly defined in the equation below.
\[\text{value} = \frac{\text{utility}}{\text{complexity}}\]The more complex a solution, the lower its value. To firm this up, consider a fictitious example scenario. Suppose you are tasked with creating an algorithm capable of identifying the two closest objects in a twodimensional plane. The algorithm below, which has a quadratic runtime ($\Theta(n^2)$), is simple and effective.
\begin{algorithm}
\caption{Brute Force Closest Pair}
\begin{algorithmic}
\REQUIRE $A$ = set of 2tuples representing $x$ and $y$ coordinates on a plane having $\vert A \vert \geq 2$
\OUTPUT $\min_{(p1, p2) \in A}{\sqrt{(p2_x  p1_x)^2 + (p2_y  p1_y)^2}}$
\FUNCTION{closestPairBruteForce}{A}
\STATE min $\gets$ \CALL{distance}{$A_0$, $A_1$}
\STATE minpoints $\gets (A_0, A_1)$
\STATE
\FOR{$p1 \in A$}
\FOR{$p2 \in A$}
\IF{$p1 \neq p2$ and \CALL{distance}{$p1$, $p2$} $\lt$ min}
\STATE min $\gets$ \CALL{distance}{$p1$, $p2$}
\STATE minpoints $\gets (p1, p2)$
\ENDIF
\ENDFOR
\ENDFOR
\STATE
\RETURN minpoints
\ENDFUNCTION
\STATE
\REQUIRE $p1, p2$ = 2tuples representing x and y coordinates
\OUTPUT $\sqrt{(p2_x  p1_x)^2 + (p2_y  p1_y)^2}$
\FUNCTION{distance}{$p1$, $p2$}
\RETURN $\sqrt{(p2_x  p1_x)^2 + (p2_y  p1_y)^2}$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
Is there a more efficient solution? The answer is yes! The algorithm below is “linearithmic”^{6} ($\Theta(n \log_2{n})$). Don’t get distracted trying to thoroughly understand it as it is explained on a different page. However, take a few minutes to appreciate the complexity.
\begin{algorithm}
\caption{Closest Points}
\begin{algorithmic}
\REQUIRE $A$ = set of 2tuples representing $x$ and $y$ coordinates on a plane having $\vert A \vert \geq 2$
\OUTPUT $\min_{(p1, p2) \in A}{\sqrt{(p2_x  p1_x)^2 + (p2_y  p1_y)^2}}$
\FUNCTION{closestPoints}{A}
\STATE $Ax \gets$ A sorted by $x$ coordinate
\STATE $Ay \gets$ A sorted by $y$ coordinate
\RETURN \CALL{recursiveClosestPoint}{$Ax$, $Ay$}
\ENDFUNCTION
\STATE
\REQUIRE $Ax$ = set of 2tuples representing $x$ and $y$ coordinates sorted by $x$ corrdinate
\REQUIRE $Ay$ = set of the same 2tuples in $Ax$ sorted by $y$ coordinate
\REQUIRE $Ay$ = list of the same points in $Ax$ sorted by y coordinate
\OUTPUT $\min_{(p1, p2) \in Ax}{\sqrt{(p2_x  p1_x)^2 + (p2_y  p1_y)^2}}$
\FUNCTION{recursiveClosestPoint}{$Ax$, $Ay$}
\STATE $n \gets \vert Ax \vert$
\STATE
\IF{n $\leq$ 3}
\RETURN \CALL{closetPairBruteForce}{Ax}
\ENDIF
\STATE
\STATE half $\gets \Bigl\lfloor \frac{\vert Ax \vert}{2} \Bigl\rfloor$
\STATE leftx $\gets \{Ax_0...Ax_{half}\}$
\STATE rightx $\gets \{Ax_{half +1}...Ax_{n1}\}$
\STATE lefty $\gets \emptyset$
\STATE righty $\gets \emptyset$
\STATE
\STATE mid $\gets Ax_{half}$
\STATE midx $\gets \text{mid}_x$
\FOR{$i \in Ay$}
\IF{$i_x \leq$ midx}
\STATE lefty $\gets \text{lefty} \cup \{i\}$
\ELSE
\STATE righty $\gets \text{righty} \cup \{i\}$
\ENDIF
\ENDFOR
\STATE
\STATE smallestleft $\gets$ \CALL{recursiveClosestPair}{leftx, lefty}
\STATE smallestright $\gets$ \CALL{recursiveClosestPair}{rightx, righty}
\STATE
\STATE delta $\gets \min\{\text{smallestleft}, \text{smallestright}\}$
\STATE smallestsplit $\gets$ \CALL{closestSplitPair}{Ax, Ay, delta}
\STATE
\RETURN $\min\{\text{delta}, \text{smallestsplit}\}$
\ENDFUNCTION
\STATE
\REQUIRE $Ax$ = set of 2tuples representing $x$ and $y$ coordinates sorted by $x$ corrdinate
\REQUIRE $Ay$ = set of the same 2tuples in $Ax$ sorted by $y$ coordinate
\REQUIRE delta = minimum distance between items in Ax and Ay
\OUTPUT $\min_{(p1, p2) \in Ax}{\sqrt{(p2_x  p1_x)^2 + (p2_y  p1_y)^2}}$
\FUNCTION{closestSplitPair}{$Ax$, $Ay$, delta}
\STATE n = $\vert A \vert$
\STATE half $\gets \Bigl\lfloor \frac{\vert A \vert}{2} \Bigl\rfloor$
\STATE mid $\gets Ax_{half}$
\STATE $\bar{x} \gets \text{mid}_x$
\STATE $S \gets \emptyset$
\STATE maxx $\gets \bar{x}$ + delta
\STATE mixx $\gets \bar{x}$  delta
\FOR{$p \in Ay$}
\IF{$p_x \gt$ minx and $p_x \lt$ maxx}
\STATE $S \gets S \cup \{p\}$
\ENDIF
\ENDFOR
\STATE smallestsplit $\gets \infty$
\FOR{i $\gets$ 0 to $\vert S \vert  1$}
\FOR{j $\gets$ i + 1 to $\min\{i + 7, \vert S \vert\}$}
\IF{\CALL{distance}{$S_i$, $S_j$} $\lt$ smallestsplit}
\STATE smallestsplit $\gets$ \CALL{distance}{$S_i$, $S_j$}
\ENDIF
\ENDFOR
\ENDFOR
\RETURN smallestsplit
\ENDFUNCTION
\STATE
\REQUIRE $p1, p2$ = 2tuples representing x and y coordinates, $p=(x,y)$, with x and y coordinates
\OUTPUT $\sqrt{(p2_x  p1_x)^2 + (p2_y  p1_y)^2}$
\FUNCTION{distance}{$p1$, $p2$}
\RETURN $\sqrt{(p2_x  p1_x)^2 + (p2_y  p1_y)^2}$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
Comparing the algorithms, it’s not immediately obvious but Algorithm 2 is actually more performant. However, it has considerably higher implementation and maintenance costs while Algorithm 1 is more malleable. Furthermore, Algorithm 2 is close to incomprehensible without commentary. The only reasonable justification for using Algorithm 2 is in the case where the performance boost provides a competitive advantage that offsets the complexity costs. The performance deficit is likely unnoticeable with meagerly sized inputs or rare invocations. It is advisable to start with the simplest algorithm and only upgrade in the event of a noticeable efficiency shortfall.
Key Takeaways
 The principal of parsimony implies that a software professional’s first and foremost responsibility is to deliver the simplest possible solution that meets the business need.
 Pragmatism is a crucial component of designing and writing successful software.
 $\text{value} = \frac{\text{utility}}{\text{complexity}}$
 The best strategy is to start with the simplest implementation and upgrade only in the event of a noticeable performance deficit.
Constants Matter!
The Asymptotic Analysis section made it clear that growth rate is the most important runtime component. While this is true, that doesn’t mean that the constants and lower order terms are wholly irrelevant. To demonstrate, consider the case of matrix multiplication. Recognize that it is an extreme example; regardless, it demonstrates the concept beautifully.
The straightforward algorithm, shown below, is $\Theta(n^3)$. Again, do not attempt to master the algorithms on this page just yet as they receive full treatment later on. Take a few minutes to appreciate the complexity or rather the lack thereof.
\begin{algorithm}
\caption{Naive Matrix Multiplication}
\begin{algorithmic}
\REQUIRE $n$ = size of the matrices
\REQUIRE $A,B$ = $n \times n$ matrices
\OUTPUT $A * B$
\FUNCTION{naiveMatrixMultiply}{$n,A,B$}
\STATE $C \gets n \times n$ matrix
\FOR{$i \gets 0$ to $n1$}
\FOR{$j \gets 0$ to $n1$}
\FOR{$k \gets 0$ to $n1$}
\STATE $C_{i,j} = C_{i,j} + A_{i,k} * B_{k,j}$
\ENDFOR
\ENDFOR
\ENDFOR
\RETURN $C$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
Now consider Strassen’s matrix multiplication algorithm as shown below which has a runtime of $\Theta(n^{2.8})$. Contemplating asymptotic complexity in a vacuum would lead one to assume that this is a superior algorithm in all cases. However, the constants hidden by the notation are unusually high. Even with a carefully tuned implementation, a performance gain isn’t obvious with matrices smaller than $1024 \times 1024$.
\begin{algorithm}
\caption{Strassen Matrix Multiplication}
\begin{algorithmic}
\REQUIRE $n$ = size of the matrices where $n$ is a power of 2 (4, 8, 16, 32,...)
\REQUIRE $A,B$ = $n \times n$ matrices
\OUTPUT $A * B$
\FUNCTION{strassenMatrixMultiply}{$n,A,B$}
\IF{$n \leq 16$} \COMMENT{tune this value for your platform}
\RETURN \CALL{naiveMatrixMultiplication}{n, A, B}
\ENDIF
\STATE $\begin{bmatrix} A & B \\ C & D \end{bmatrix} \gets \text{submatrices of} \; A$
\STATE $\begin{bmatrix} E & F \\ G & H \end{bmatrix} \gets \text{submatrices of} \; B$
\STATE $P_1 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A, F  H$}
\STATE $P_2 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A + B, H$}
\STATE $P_3 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, C + D, E$}
\STATE $P_4 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, D, G  E$}
\STATE $P_5 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A + D, E + H$}
\STATE $P_6 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, B  D, G + H$}
\STATE $P_7 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A  C, E + F$}
\RETURN $\begin{bmatrix} P_5+P_4−P_2+P_6 & P_1+P_2 \\ P_3 + P_4 & P_1+P_5P_3−P_7 \end{bmatrix}$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
Strassen’s algorithm is useful in practice; unfortunately, the same cannot be said for all algorithms. For instance, the Coppersmith–Winograd matrix multiplication algorithm has a runtime of $O(2^{2.374})$; however, it has no practical purpose. The constants are so massive that there is no known terrestrial dataset large enough to take advantage of the improved asymptotic runtime. Algorithms such as this are known as galactic algorithms^{7}.
Key Takeaways
 In practice, the constants hidden by asymptotic notation can be large enough to render algorithms with higher complexity preferable.
Context Matters
The underlying message of the previous sections is that context matters. There are no absolutes. The best alternative is rarely clear and it’s always different depending on the situation in which it is applied. Asymptotic analysis typically addresses the general case. It’s up to you, as a software professional, to evaluate your specific situation. Below are a few considerations above and beyond algorithmic complexity.
 It may be possible to manipulate your data to take advantage of a bestcase runtime ($\Omega$) rather than the worsecase runtime ($O$).
 It may be possible to parallelize computation thus raising the total number of requisite machine cycles while lowering the time as measured by a clock.
 Memory consumption may be more important than runtime in some contexts.
 Machine architecture can have a huge impact on actual runtime.
Above all, understand that the intricacies that comprise runtime as measured by a clock are staggering. More often than not, microseconds just don’t matter. However, when they are at a premium the only responsible course of action is benchmarking. Test several algorithm variations on representative hardware and data, adjust them, and test again. This seems painfully obvious but it’s all too common for software professionals to neglect the practice. Just to be absolutely clear, never attempt a performance optimization without repeatable benchmarks.
To sum this up, the algorithm with the lowest algorithmic complexity many not be the most performant in a particular application. It is an important data point; however, there are many more to consider.
Key Takeaways
 There is no best solution. Software professionals have the responsibility to find the solution that best conforms to their context.
 There are other ways to improve performance beyond reducing asymptotic complexity.
 Never attempt any performance optimization without clear benchmarks on representative hardware.

A quick Google search revels many sources that attribute this quote to Albert Einstein. The original source of the quote is ambiguous as there are conflicting accounts. ↩

Oxford Reference: https://www.oxfordreference.com/view/10.1093/oi/authority.20110803100346221 ↩

Technically, the principle of parsimony is a law for describing natural phenomenon and software does not materialize naturally. In this context, it is prescriptive rather than descriptive. ↩

The Art of Computer Programming (TAOCP) by Donald Knuth is the most influential computer science work ever created. It shaped the industry. It’s next to impossible to find an algorithm textbook that doesn’t reference it. American Scientist listed TAOCP as one of the best twelve physicalscience monographs of the century.
Below is a listing of the lastest volumes of TAOCP as of the time of this writing.

From Donald Knuth’s 1976 paper entitled Structured Programming With goto Statements. It was printed in the book Classics in Software Engineering in 1979. It is available on Amazon using the link below. https://www.amazon.com/dp/0917072146/ref=cm_sw_r_tw_dp_U_x_nZXTEb7NCRCTP ↩

Linearithmic is a portmanteau of linear and logarithmic meaning $n \log{n}$. Although it violates the “no constants and lower order terms” rule, it’s often useful in practice. The lesson here is that it’s perfectly acceptable to bend the rules for the sake of clarity. ↩

The name galactic algorithm originates from Richard Lipton and Ken Regan. Due to the highly theoretical nature of the topic, this is it’s only mention on this site. Interested readers can consult the following link for more information: https://rjlipton.wordpress.com/2010/10/23/galacticalgorithms/ ↩