Prerequisites (click to view)
graph LR
ALG(["fas:fatrophy Real Wold Asymptotic Analysis fas:fatrophy #160;"])
ASY_ANA(["fas:facheck Asymptotic Analysis #160;"])
click ASY_ANA "./mathasymptoticanalysis"
ASY_ANA>ALG
Without a doubt, asymptotic analysis is the most important tool in any algorithm designer’s toolbox. It provides a conceptual framework for divining the runtime of an algorithm without executing it. Many of the field’s advancements would not have been possible without it. Regardless, please heed these words of caution: anything taken to an extreme has deleterious effects. It’s easy to obsess over asymptotic complexity (and performance by extension) as it is the focal point of the study of algorithms. However, the very core of software engineering is managing tradeoffs. Overweighting a single variable is to be naive of reality. Asymptotic complexity is a data point for accessing the utility of an algorithm, it is not the whole story. The purpose of this page is to provide the reader with a wholistic view of algorithmic analysis.
Principal of Parsimony
In other scientific domains, the concept of simplicity is a given. Albert Einstein is quoted as saying, “everything should be made as simple as possible, but not simpler.^{1}” The sciences even have a term for the concept: the principal of parsimony (aka law of parsimony or Occam’s razor). Roughly defined, it is “the principle that the most acceptable explanation of an occurrence, phenomenon, or event is the simplest, involving the fewest entities, assumptions, or changes.”^{2} For algorithms, this means the best implementation is the simplest code, that meets all business needs, involving the fewest entities^{3}.
The principal of parsimony implies that a software professional’s first and foremost responsibility is to deliver the simplest possible solution that meets the business need. Sometimes meeting business need entails squeezing out every last ounce of performance. However, that is not typically the case. Pragmatism is the key to writing successful software. For those who doubt the veracity of this claim, consider this. Donald Knuth literally wrote the book on algorithmic analysis^{4} and there is no greater authority on the subject. The following quote comes directly from his writings^{5}.
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.(emphasis added)  Donald Knuth
Much “blood and ink have been spilt” on the merits of software simplicity. A full treatment of the subject is well beyond the scope of this work. It’s sufficient to say, if you must obsess over a software metric, the most appropriate metric is pragmatism. The result of pragmatism is the tenacious pursuit for value. Value is roughly defined in the equation below.
\[\text{value} = \frac{\text{utility}}{\text{complexity}}\]The more complex a solution, the lower its value. To firm this up, consider a fictitious example scenario. Suppose you are tasked with creating an algorithm capable of identifying the two closest objects in a twodimensional plane. The algorithm below, which has a quadratic runtime ($\Theta(n^2)$), is simple and effective.
\begin{algorithm}
\caption{Closest Pair  Two Dimensions  Brute Force}
\begin{algorithmic}
\REQUIRE $A$ = set of 2tuples representing $x$ and $y$ coordinates on a plane having $\vert A \vert \geq 2$
\OUTPUT the distance between the two closest points as measured by Euclidean distance
\FUNCTION{closestPairBruteForce}{A}
\STATE closestdist $\gets$ \CALL{distance}{$A_0$, $A_1$}
\STATE
\FOR{$p1 \in A$}
\FOR{$p2 \in A$}
\IF{$p1 \neq p2$ and \CALL{distance}{$p1$, $p2$} $\lt$ closestdist}
\STATE closestdist $\gets$ \CALL{distance}{$p1$, $p2$}
\ENDIF
\ENDFOR
\ENDFOR
\STATE
\RETURN closestdist
\ENDFUNCTION
\STATE
\REQUIRE $p1, p2$ = 2tuples representing x and y coordinates
\OUTPUT Euclidean distance between p1 and p2
\FUNCTION{distance}{$p1$, $p2$}
\RETURN $\sqrt{(p2_x  p1_x)^2 + (p2_y  p1_y)^2}$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
Is there a more efficient solution? The answer is yes! The algorithm below is “linearithmic”^{6} ($\Theta(n \log_2{n})$). Don’t get distracted trying to thoroughly understand it as it is explained on a different page. However, take a few minutes to appreciate the complexity.
\begin{algorithm}
\caption{Closest Points}
\begin{algorithmic}
\REQUIRE A = set of 2tuples representing x and y coordinates on a plane having $\vert A \vert \geq 2$
\OUTPUT the distance between the two closest points as measured by Euclidean distance
\FUNCTION{closestPair}{A}
\STATE Ax $\gets$ A sorted by the x coordinate
\STATE Ay $\gets$ A sorted by the y coordinate
\RETURN \CALL{closestPairRecursive}{$Ax$, $Ay$}
\ENDFUNCTION
\STATE
\REQUIRE Ax = set of 2tuples representing x and y coordinates sorted by the x corrdinate
\REQUIRE Ay = set of 2tuples representing x and y coordinates sorted by the y corrdinate
\OUTPUT the distance between the two closest points as measured by Euclidean distance
\FUNCTION{closestPairRecursive}{Ax, Ay}
\STATE $n \gets \vert Ax \vert$
\STATE
\IF{n $\leq$ 3}
\RETURN \CALL{closestPairBruteForce}{Ax}
\ENDIF
\STATE
\STATE \COMMENT{Divide the points down the middle of the x axis}
\STATE mid $\gets \floor{\frac{n}{2}}$
\STATE leftcut $\gets$ $\{Ax_0...Ax_{\text{mid}  1}\}$
\STATE rightcut $\gets$ $\{Ax_{\text{mid}}...Ax_{n1}\}$
\STATE
\STATE \COMMENT{This x value is the dividing point}
\STATE divide $\gets Ax_{\text{mid 1}}.x$
\STATE
\STATE \COMMENT{Split the ysorted points into halves that match leftcut and rightcut}
\STATE \COMMENT{This eliminates the need for an additonal $O(n \log_2{n})$ sort operation}
\STATE lefty $\gets \emptyset$
\STATE righty $\gets \emptyset$
\FOR{$i \in Ay$}
\IF{$i.x \leq$ divide}
\STATE lefty $\gets \text{lefty} \cup \{i\}$
\ELSE
\STATE righty $\gets \text{righty} \cup \{i\}$
\ENDIF
\ENDFOR
\STATE
\STATE leftclosestdist $\gets$ \CALL{closestPairRecursive}{leftcut, lefty}
\STATE rightclosestdist $\gets$ \CALL{closestPairRecursive}{rightcut, righty}
\STATE
\STATE closestdist $\gets$ \CALL{min}{leftclosestdist, rightclosestdist}
\STATE
\STATE \COMMENT{Create the strip}
\STATE strip $\gets \emptyset$
\FOR{$i \in Ay$}
\STATE \COMMENT{$\vert \vert$ denotes absolute value in this instance}
\IF{$\vert i.x  \text{divide} \vert \lt$ closestdist}
\STATE strip $\gets \text{strip} \cup \{i\}$
\ENDIF
\ENDFOR
\STATE
\STATE \COMMENT{Compare each point in the strip to the points within closestdist below it}
\FOR{$i \gets 0$ to $\vert\text{strip}\vert$}
\STATE $j \gets i + 1$
\STATE \COMMENT{This while loop will never execute more than 6 times}
\WHILE{$j < \vert\text{strip}\vert$ and $(\text{strip}_{j}.y  \text{strip}_i.y) \lt$ closestdist}
\STATE dist $\gets$ \CALL{distance}{$\text{strip}_i, \text{strip}_j$}
\IF{dist $\lt$ closestdist}
\STATE closestdist $\gets$ dist
\ENDIF
\STATE $j \gets j + 1$
\ENDWHILE
\ENDFOR
\STATE
\STATE return closestdist
\ENDFUNCTION
\STATE
\REQUIRE $p1, p2$ = 2tuples representing x and y coordinates
\OUTPUT Euclidean distance between p1 and p2
\FUNCTION{distance}{$p1$, $p2$}
\RETURN $\sqrt{(p2_x  p1_x)^2 + (p2_y  p1_y)^2}$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
Comparing the algorithms, it’s not immediately obvious but Algorithm 2 is actually more performant. However, it has considerably higher implementation and maintenance costs while Algorithm 1 is more malleable. Furthermore, Algorithm 2 is close to incomprehensible without commentary. The only reasonable justification for using Algorithm 2 is in the case where the performance boost provides a competitive advantage that offsets the complexity costs. The performance deficit is likely unnoticeable with meagerly sized inputs or rare invocations. It is advisable to start with the simplest algorithm and only upgrade in the event of a noticeable efficiency shortfall.
Key Takeaways
 The principal of parsimony implies that a software professional’s first and foremost responsibility is to deliver the simplest possible solution that meets the business need.
 Pragmatism is a crucial component of designing and writing successful software.
 $\text{value} = \frac{\text{utility}}{\text{complexity}}$
 The best strategy is to start with the simplest implementation and upgrade only in the event of a noticeable performance deficit.
Constants Matter!
The Asymptotic Analysis section made it clear that growth rate is the most important runtime component. While this is true, that doesn’t mean that the constants and lower order terms are wholly irrelevant. To demonstrate, consider the case of matrix multiplication. Recognize that it is an extreme example; regardless, it demonstrates the concept beautifully.
The straightforward algorithm, shown below, is $\Theta(n^3)$. Again, do not attempt to master the algorithms on this page just yet as they receive full treatment later on. Take a few minutes to appreciate the complexity or rather the lack thereof.
\begin{algorithm}
\caption{Naive Matrix Multiplication}
\begin{algorithmic}
\REQUIRE $n$ = size of the matrices
\REQUIRE $A,B$ = $n \times n$ matrices
\OUTPUT $A * B$
\FUNCTION{naiveMatrixMultiply}{$n,A,B$}
\STATE $C \gets n \times n$ matrix
\FOR{$i \gets 0$ to $n1$}
\FOR{$j \gets 0$ to $n1$}
\FOR{$k \gets 0$ to $n1$}
\STATE $C_{i,j} = C_{i,j} + A_{i,k} * B_{k,j}$
\ENDFOR
\ENDFOR
\ENDFOR
\RETURN $C$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
Now consider Strassen’s matrix multiplication algorithm as shown below which has a runtime of $\Theta(n^{2.8})$. Contemplating asymptotic complexity in a vacuum would lead one to assume that this is a superior algorithm in all cases. However, the constants hidden by the notation are unusually high. Even with a carefully tuned implementation, a performance gain isn’t obvious with matrices smaller than $1024 \times 1024$.
\begin{algorithm}
\caption{Strassen Matrix Multiplication}
\begin{algorithmic}
\REQUIRE $n$ = size of the matrices where $n$ is a power of 2 (4, 8, 16, 32,...)
\REQUIRE $A,B$ = $n \times n$ matrices
\OUTPUT $A * B$
\FUNCTION{strassenMatrixMultiply}{$n,A,B$}
\IF{$n \leq 16$} \COMMENT{tune this value for your platform}
\RETURN \CALL{naiveMatrixMultiplication}{n, A, B}
\ENDIF
\STATE \COMMENT{Split the matrices into submatrices that are $\frac{1}{4}$ the size of the original}
\STATE $\begin{bmatrix} A_1 & A_2 \\ A_3 & A_4 \end{bmatrix} \gets \text{submatrices of} \; A$
\STATE $\begin{bmatrix} B_1 & B_2 \\ B_3 & B_4 \end{bmatrix} \gets \text{submatrices of} \; B$
\STATE
\STATE $P_1 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_1, B_2  B_4$}
\STATE $P_2 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_1 + A_2, B_4$}
\STATE $P_3 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_3 + A_4, B_1$}
\STATE $P_4 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_4, B_3  B_1$}
\STATE $P_5 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_1 + A_4, B_1 + B_4$}
\STATE $P_6 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_2  A_4, B_3 + B_4$}
\STATE $P_7 \gets$ \CALL{strassenMatrixMultiply}{$\frac{n}{2}, A_1  A_3, B_1 + B_2$}
\RETURN $\begin{bmatrix} P_5+P_4−P_2+P_6 & P_1+P_2 \\ P_3 + P_4 & P_1+P_5P_3−P_7 \end{bmatrix}$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
Strassen’s algorithm is useful in practice; unfortunately, the same cannot be said for all algorithms. For instance, the Coppersmith–Winograd matrix multiplication algorithm has a runtime of $O(2^{2.374})$; however, it has no practical purpose. The constants are so massive that there is no known terrestrial dataset large enough to take advantage of the improved asymptotic runtime. Algorithms such as this are known as galactic algorithms^{7}.
Key Takeaways
 In practice, the constants hidden by asymptotic notation can be large enough to render algorithms with higher complexity preferable.
Context Matters
The underlying message of the previous sections is that context matters. There are no absolutes. The best alternative is rarely clear and it’s always different depending on the situation in which it is applied. Asymptotic analysis typically addresses the general case. It’s up to you, as a software professional, to evaluate your specific situation. Below are a few considerations above and beyond algorithmic complexity.
 It may be possible to manipulate your data to take advantage of a bestcase runtime ($\Omega$) rather than the worsecase runtime ($O$).
 It may be possible to parallelize computation thus raising the total number of requisite machine cycles while lowering the time as measured by a clock.
 Memory consumption may be more important than runtime in some contexts.
 Machine architecture can have a huge impact on actual runtime.
Above all, understand that the intricacies that comprise runtime as measured by a clock are staggering. More often than not, microseconds just don’t matter. However, when they are at a premium the only responsible course of action is benchmarking. Test several algorithm variations on representative hardware and data, adjust them, and test again. This seems painfully obvious but it’s all too common for software professionals to neglect the practice. Just to be absolutely clear, never attempt a performance optimization without repeatable benchmarks.
To sum this up, the algorithm with the lowest algorithmic complexity may not be the most performant in a particular application. It is an important data point; however, there are many more to consider.
Key Takeaways
 There is no best solution. Software professionals have the responsibility to find the solution that best conforms to their context.
 There are other ways to improve performance beyond reducing asymptotic complexity.
 Never attempt any performance optimization without clear benchmarks on representative hardware.

A quick Google search revels many sources that attribute this quote to Albert Einstein. The original source of the quote is ambiguous as there are conflicting accounts. ↩

Oxford Reference: https://www.oxfordreference.com/view/10.1093/oi/authority.20110803100346221 ↩

Technically, the principle of parsimony is a law for describing natural phenomenon and software does not materialize naturally. In this context, it is prescriptive rather than descriptive. ↩

The Art of Computer Programming (TAOCP) by Donald Knuth is the most influential computer science work ever created. It shaped the industry. It’s next to impossible to find an algorithm textbook that doesn’t reference it. American Scientist listed TAOCP as one of the best twelve physicalscience monographs of the century.
Below is a listing of the lastest volumes of TAOCP as of the time of this writing.

From Donald Knuth’s 1976 paper entitled Structured Programming With goto Statements. It was printed in the book Classics in Software Engineering in 1979. It is available on Amazon using the link below. https://www.amazon.com/dp/0917072146/ref=cm_sw_r_tw_dp_U_x_nZXTEb7NCRCTP ↩

Linearithmic is a portmanteau of linear and logarithmic meaning $n \log{n}$. Although it violates the “no constants and lower order terms” rule, it’s often useful in practice. The lesson here is that it’s perfectly acceptable to bend the rules for the sake of clarity. ↩

The name galactic algorithm originates from Richard Lipton and Ken Regan. Due to the highly theoretical nature of the topic, this is it’s only mention on this site. Interested readers can consult the following link for more information: https://rjlipton.wordpress.com/2010/10/23/galacticalgorithms/ ↩