Prerequisites (click to view)
graph LR
ASY_ANA(["fas:fatrophy Asymptotic Analysis fas:fatrophy"])
COM(["fas:facheck Combinatorics #160;"])
click COM "./mathcombinatorics"
SET_NOT(["fas:facheck Set Notation #160;"])
click SET_NOT "./mathsetnotation"
GRA(["fas:facheck Graphing #160;"])
click GRA "./mathgraphing"
COM & GRA & SET_NOT>ASY_ANA
This page explores using asymptotic analysis, sometimes referred to as Big O^{1}, to access the relative efficiency of algorithms. Like most mathematical concepts, asymptotic analysis is an abstraction for conveyance of large ideas in a compact form. Finding its roots in number theory (see history), it is a method of approximating the accuracy of function approximations at a point^{2}. Its primary utility is expressing the consumption of a constrained resource in terms of tradeoffs and measurement at scale. In the domain of computer science, that resource is typically algorithm runtime^{3}. This is one of the most important tools in an algorithm designer’s toolbox. It is the lens through which algorithms are evaluated and a foundational skill required for comprehension of any algorithm or data structure.
History
The concept of asymptotic analysis has evolved over the past ~130 years. In 1894, Paul Bachmann, introduced the concept in his treatise on number theory^{4}. Asymptotic notation was introduced by Edmund Landau in his 1909 book on the same topic^{5}. This is why (outside the computer science field anyway) the notation symbols are often referred to as Landau Symbols. In 1914, Godfrey Hardy and John Littlewood expanded on the notation by introducing a new symbol ($\Omega$  big omega)^{6}.
Asymptotic analysis wasn’t associated with algorithmics until the 1970s. Donald Knuth was instrumental in this effort. He solidified the connection between algorithms and asymptotics and redefined the notation in his 1976 paper entitled Big Omicron and big Omega and big Theta^{7}. His continued work popularized the concept. We all owe Mr. Knuth a great deal of gratitude for his contributions to the field of computer science.
What are we measuring?
At first glance, the question of “what are we measuring?” seems almost mundane. Intuitively, runtime is the time, as measured by a clock, that it takes an algorithm to execute in its entirety. This seems reasonable but it has one glaring deficiency: clock time measurements vary depending on hardware. How can two people compare the relative efficacy of different algorithms given that they have different machines? It’s not hard to conceive any number of elaborate solutions but there is an easy remedy: compare the approximate number of requisite machine operations. Regardless of the hardware^{8}, more operations require more time.
Notice the two emphasized words in the above paragraph: approximate and machine operations. It’s not at all practical to measure the actual number of machine instructions that an algorithm requires as this is dependent on many factors including programming language and machine implementation specifics. The precise number of operations is also variable depending on data types. For instance, the hardware implementations of integer and floatingpoint arithmetic are wholly divergent. To add to the conundrum, specific processors are capable of performing certain operations in parallel. To avoid this minutia, algorithmic analysis assumes a generic single process Random Access Machine (RAM)^{9} computational model.
The RAM model of computation assumes sequential execution of instructions, fixed word sizes^{10}, and standard integer and floatingpoint numbers. Its operations, listed below, closely resemble real computers.
 Arithmetic (add, subtract, multiply, divide, remainder, floor, ceiling)
 Data Movement (load, store, copy)
 Control (conditional branch, unconditional branch, subroutine call, return)
It’s important not to get too hung up on what constitutes a single instruction.
It’s perfectly acceptable to take liberties with the RAM model to a certain
extent. For instance, i = 10 + 5
is technically two operations: store and
subtract. However, for the purposes of algorithmic analysis, it’s considered a
single operation. Likewise, a complex calculation such as the Euclidean
distance between two points ($d = \sqrt{(x_2  x_1)^2 + (y_2  y_1)^2}$) is also
a single operation. Conversely, it’s equally important not to abuse the model.
One should not assume locating an item in a sea of a data is a single operation.
See the example pseudo code below.
\begin{algorithm}
\caption{8 Machine Operations}
\begin{algorithmic}
\STATE $j \gets 0$ \Comment{1 data operation}
\FOR{$i \in \{1,2,3\}$} \Comment{1 control operations $\times$ 3 iterations}
\STATE $j \gets i \times 2$ \Comment{1 arithmetic/data operations $\times$ 3 iterations}
\ENDFOR
\RETURN{j} \Comment{1 control operation}
\end{algorithmic}
\end{algorithm}
Understandably, the ambiguous means for calculating the number of operations required by an algorithm is troublesome to some. It bears repeating: the exact number is irrelevant. The goal is to ascertain the relative efficiency of an algorithm. One can reasonably assume that 100 operations, as loosely defined by the RAM model, requires less resources than 100,000 operations regardless of the machine on which it executes. In real world scenarios, it’s prudent to benchmark algorithms on physical machines when the number of operations is close.
Up to this point, there is one glaring omission with estimating algorithmic efficiency: algorithms that do not act on input data are rare. For the largest majority, the number of operations is relative to the input size. This is the topic of the next section.
Key Takeaways
 Measuring algorithmic efficiency by clock time is problematic because it does not provide a means for making comparisons independent of physical machines.
 The RAM model of computation provides a standard for comparing the approximate number of operations required by an algorithm independent of the physical machine on which it executes.
 Within reason, more operations as calculated using the RAM model will equate to more operations on a physical machine.
 It’s acceptable to take liberties with the RAM model as long as it’s not overly abused. The exact figure is not advantageous.
$T(n)$ Machine Operations as a Function
As mentioned at the end of the previous section, the number of operations an algorithm requires is relative to the input. Because of this, it makes sense to express efficiency as a function of the input. Individual algorithms define input size differently, it could mean number of bits or even a scalar value. Most commonly, size refers to number of input items such as nodes in a graph or elements in an array. By convention, $n$ is used to express input size and $T$ is used to express the growth function ($T(n)$ = algorithm efficiency function). This is best demonstrated with an example. Consider the algorithm below for squaring an array of integers.
\begin{algorithm}
\caption{Square Function}
\begin{algorithmic}
\REQUIRE $A$ = a set of integers where $\vert A \vert \gt 0$
\ENSURE $\forall i \in A \vert i = i^2$
\FUNCTION{square}{$A$}
\FOR{$i \gets 0$ \TO $\vert A \vert  1$}
\STATE $A_i = A_i \times A_i$
\ENDFOR
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
In the square function, line two contributes one operation for every item in the input array ($n$) and line three contributes and additional one per item ($n$). That’s two total operations ($n+n$) per item in the input array. The resulting runtime function is $T(n) = 2n$. Graphing the function provides a concise means of expressing the expected runtime. See the plot below.
Admittedly, the example above isn’t particular fascinating. Consider the more complex algorithm shown below. Don’t be troubled by the logic, there are more efficient ways of finding duplicates. Concentrate instead on the runtime analysis. Try to calculate the number of operations yourself before reading on.
\begin{algorithm}
\caption{Find Duplicates Function}
\begin{algorithmic}
\REQUIRE $A$ = list of strings where $\vert A \vert \gt 0$
\OUTPUT set containing all duplicate items in $A$
\FUNCTION{findDuplicates}{$A$}
\STATE $D \gets \emptyset$
\FOR{$i \gets 0$ \TO $\vert A \vert  1$}
\FOR{$j \gets 0$ \TO $\vert A \vert  1$}
\IF{$i \neq j$ and $A_i = A_j$}
\STATE $D \gets D \cup \{A_i\}$
\ENDIF
\ENDFOR
\ENDFOR
\RETURN $D$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
Let’s examine this one line at a time:
 Line 2 = 1 operation
 Line 3 = $n$ operations (one per input item)
 Line 4 = $n^2$ operations ($n$ per input item)
 Line 5 = $n^2$ operations
 Line 6 = between zero and $n^2$ operations depending on the input values
 Line 10 = one operation
To calculate the runtime, simply sum each line. Line six presents a problem
because the final answer is different depending on the values contained in $A$.
In the worst case scenario, where every item is a duplicate, the total number of
operations is $T(n)=1+n+n^2+n^2+n^2+1$ which simplifies to $T(n)=3n^2+n+2$. When
there are no duplicates, the analysis yields $T(n)=1+n+n^2+n^2+0+1$ or
$T(n)=2n^2+n+2$. Therefore, the findDuplicates
algorithm has a lower runtime
bound of $T(n)=2n^2+n+2$ and an upper bound of $T(n) = 3n^2+n+2$. These
functions are summarized below for convenience. The differences are
$\term{\text{highlighted}}$ for your convenience.
Case  Sum of Operations  Simplified 

Best  $T(n)=1+n+n^2+n^2+\term{n^2}+1$  $T(n)=\term{3}n^2+n+2$ 
Worst  $T(n)=1+n+n^2+n^2+\term{0}+1$  $T(n)=\term{2}n^2+n+2$ 
The plot below compares the upper and lower bounds of findDuplicates
to
square
.
Based on the plot above, one can reasonably assume that square
will run
quicker than findDuplicates
regardless of the programming language used to
generate the byte code or the machine executing it. The larger the input, the
greater the difference in runtime. The RAM model and bit of algebra provides a
standard language for comparison.
There is still one deficiency with algorithmic analysis as shown in this section: it’s too complex. The examples are too simplistic to be representative of real world problems. One can imagine how difficult specifying a more intricate procedure would be. Using asymptotic notation, one can express efficiency functions more concisely. This is the subject of the next section.
Key Takeaways
 The runtime of an algorithm is proportional to the size of its input; therefore, the best way to express it as a function of the input.
 The runtime of an algorithm can change based not only on input size but also the values contained in the input. In these case, separate functions can represent the upper and lower bounded runtime.
Asymptotic Complexity
Donald Knuth is quoted as saying, “[Asymptotic notation] significantly simplifies calculations because it allows us to be sloppy – but in a satisfactorily controlled way.”^{11} The goal is to express complex ideas about algorithmic efficiency in as concise a manner as possible. The result of overly meticulous figures is drowning in a sea of semirelevant details. On the other hand, excessive omissions lead to inaccuracy. The middle ground is using asymptotic notation to denote asymptotic complexity.
The most important component of algorithmic efficiency is growth rate. Loosely
consider growth rate as the shape of the resulting plot^{12}. This
“shape” is the asymptotic complexity and it represents the “sweet spot” between
precision and chaos. Removing all constants from a function does not change the
growth rate. Furthermore, removing all but the leading term does not change the
growth rate. The fact of the matter is that everything other than the leading
term is relatively insignificant for large values of $n$. This is a lot to
take in, so consider this example. Recall the upper bound of the
findDuplicates
algorithm. It is shown below with $\constant{\text{constants}}$
and nonleading $\term{\text{terms}}$ highlighted.
Removing the constants and nonleading terms results in an asymptotic complexity of $T(n) = n^2$. Plotting the original function against the asymptotic complexity reveals that they both have the same general “shape”. Consult the image below.
Below are a few more examples of converting runtime functions to asymptotic complexity. Again $\constant{\text{constants}}$ and nonleading $\term{\text{terms}}$ are highlighted for your convenience.
Runtime Function  Asymptotic Complexity 

$T(n) = \constant{12}n^6 + \term{3n}$  $n^6$ 
$T(n) = 2^n + \term{34n} + \constant{5}$  $2^n$ 
$T(n) = 3^n + \term{n^6}$  $3^n$ 
$T(n) = \constant{300}n + \constant{75}$  $n$ 
$T(n) = \constant{5} \log_2{n} + \constant{3}$  $\log_2{n}$ 
$T(n) = \constant{3}+\constant{10}\times \constant{2}$  $1$ 
There are six basic classes of asymptotic complexity which are listed in the table below. Included in the table are example algorithms; these are for future reference, do not get sidetracked studying all of them right now. Underneath the table is an image providing a general visual reference for the shape of the asymptotic complexity classes.
Name  Growth Rate  Example 

Constant  $T(n) = 1$  Hash Table 
Logarithmic  $T(n) = \log_2{n}$  Binary Tree (Search) 
Linear  $T(n) = n$  Linked List (Search) 
Quadratic  $T(n) = n^2$  Bubble Sort 
Exponential  $T(n) = 2^n$  Traveling Salesman (Dynamic Programming) 
Factorial  $T(n) = n!$  Traveling Salesman (Brute Force) 
As a side note, the final two classes, exponential and factorial, are intractable for all but very small inputs. You’ll learn more about these while studying $P \stackrel{?}{=} NP$ at the very end of all the content on this site.
PRO TIP: The six basic classes of asymptotic complexity are standard lexicon for software professionals. It’s well worth the time to commit them to memory.
Considering that asymptotic complexity ignores constants and nonleading terms, what exactly does it mean to compare algorithms of different classes? Consider, for example, a linear algorithm $\sigma$ and a quadratic algorithm $\alpha$^{13}. One can assume that $\sigma$ has a faster runtime than $\alpha$ for values of $n$ larger than some unspecified constant. This is again best described with an example. Assume the runtime of $\sigma$ is $50n$ which is asymptotically linear ($T(n) = n$) and $\alpha$ has a runtime of exactly $n^2$ which is quadratic. For values of $n$ lower than $50$, $\sigma$ executes faster than $\alpha$. This is shown graphically below.
This is often confusing upon first encounter. The Real World Asymptotic Complexity page provides additional details and examples. For now, accept that it’s reasonable to assume that higher asymptotic complexity equates to longer run times for large input sizes.
This section describes modelling both upper and lower runtime bounds. The next section describes a succinct notation for upper, lower, and tight bounded asymptotic complexity.
Key Takeaways
 Asymptotic complexity represents the growth rate of a function. It’s helpful to think of this as the “shape” of the function.
 Everything other than the leading term is relatively insignificant for large values of $n$.
 To convert a function to asymptotic complexity, delete all constants and nonleading terms.
 There are six general classes of asymptotic complexity: constant, logarithmic, linear, quadratic, exponential, and factorial.
 Larger asymptotic complexity values equate to longer run times given the input size is sufficiently large.
Asymptotic Notation
As established above, asymptotic complexity represents the growth rate of a function. In the domain of computer science, that function typically represents algorithmic runtime^{3}. The detail of precisely which runtime has thus far been neglected. It isn’t clear if $T(n)$ represents the worst, best, or average case. Luckily, asymptotic notation provides a few symbols to aid in this endeavor. Here we introduce $\Theta$ (Big Theta), $O$ (Big O  the letter, not the number zero), and $\Omega$ (Big Omega)^{14}.
$\Theta$ Big Theta  Tight Bound
The $\Theta$ symbol, pronounced big theta, represents an asymptotic tight bound. Stated simply, it indicates that the growth rate remains constant regardless of the input data. Another way of stating this is that the upper and lower bounds are within a constant factor of the specified function. It’s not possible to specify $\Theta$ for all algorithms because the upper and lower runtime bounds can differ based on the values fed to it (example shown in the next section).
Recall the analysis of the findDuplicates
function. The
worstcase runtime is $T(n)=3n^2+n+2$ and the best case is $T(n)=2n^2+n+2$. The
asymptotic complexity for both these is $T(n)=n^2$. Therefore, the statement
$\Theta(n^2)$ (read big theta of n squared) is true for the findDuplicates
function.
$O$ Big O  Upper Bound
O stands for Order of or more precisely the German word Ordnung. This is by far the most popular asymptotic symbol. In fact, it’s not altogether uncommon for people to use the terms $O$ and asymptotic complexity synonymously. $O$ represents the worstcase scenario. That is, the longest possible runtime as produced by providing pathological^{15} input data. The mathematical statement $O(n^2)$ reads “big O of $n$ squared” and it indicates that in the worst case scenario, the algorithm in question has an algorithmic complexity of $n^2$.
The previous section defined findDuplicates
$ = \Theta(n^2)$. The statement
findDuplicates
$ = O(n^2)$ is also correct because $n^2$ is also the
asymptotic upper bound. Both statements are correct; however, $O$ makes no
declaration of a lower bound while $\Theta$ denotes a stronger statement
encompassing both the lower and upper bound.
As a side note, interpreted literally, upper bound is rather ambiguous. One could say $O(n!)$ is true for virtually any algorithm. The reader is asked to ignore this pedantic detail and interpret $O$ as the smallest possible upper bound^{16}.
$\Omega$ Big Omega  Lower Bound
The last symbol is $\Omega$, pronounced big omega. As is most likely already
obvious, it represents the asymptotic lower bound of an algorithm. It is the
best possible scenario when all the input data conspires together to reduce the
number of machine operations to an absolute minimum. Again, recall that
findDuplicates
$ = \Theta(n^2)$. The statement findDuplicates
$ =
\Omega(n^2)$ is also correct because $n^2$ is the asymptotic lower bound. The
next section provides an example of an algorithm where the asymptotic upper and
lower bounds are different.
Example Analysis
The example below is meant to reinforce the concepts outlined thus far. Try to analyze the bubble sort algorithm before reading on.
\begin{algorithm}
\caption{Bubble Sort}
\begin{algorithmic}
\REQUIRE $A$ = list of numbers where $\vert A \vert \gt 2$
\ENSURE $A$ sorted in sequential order
\FUNCTION{bubbleSort}{$A$}
\STATE sorted $\gets$ false
\STATE unsortedtoindex $\gets \vert A \vert  2$
\WHILE{sorted is false}
\STATE sorted $\gets$ true
\FOR{i $\gets$ 0 to unsortedtoindex}
\IF{$A_i \gt A_{i+1}$}
\STATE swap $A_i$ and $A_{i+1}$
\STATE sorted $\gets$ false
\ENDIF
\ENDFOR
\STATE usortedtoindex $\gets$ unsortedtoindex  1
\ENDWHILE
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
The table below outlines the rough detailed analysis of the bubble sort algorithm.
Line #  Lower  Upper  $\sum$ lower  $\sum$ upper 

2  1  1  1  1 
3  1  1  2  2 
4^{*}  1  $n+1$  3  $n+3$ 
5  1  $n$  4  $2n+3$ 
6^{**}  $n$  $n^2$  $n+4$  $n^2+2n+3$ 
7  $n$  $n^2$  $2n+4$  $2n^2+2n+3$ 
8  0  $n^2$  $2n+4$  $3n^2+2n+3$ 
9  0  $n^2$  $2n+4$  $4n^2+2n+3$ 
12  1  $n$  $2n+5$  $4n^2+3n+3$ 
Total  $2n+5$  $4n^2+3n+3$ 
^{*} Continues to loop until the for loop on line 6 makes a complete revolution without triggering the if statement on line 7. The extra operation is the exit condition (sorted is true)
^{**} The number of iterations is reduced by 1 for each iteration of the outer loop by virtue of line 12. So it’s technically $\sum_{i=1}^{n} i$^{17}; however, this is a bit of minutia that’s best left rounded up to $n^2$.
The final step is to convert the runtime functions to asymptotic complexity. The bubble sort algorithm is $O(n^2)$ and $\Omega(n)$. That is, quadratic in the worst case and linear in the best case.
Key Takeaways
 $\Theta$, pronounced “big theta” represents tightly bounded asymptotic complexity. That is, $\Theta(x)$ indicates that the actual runtime is within some constant of $x$ regardless of input values.
 $O$, pronounced “big 0” represents upper bounded asymptotic complexity. That is, $O(x)$ indicates that the maximum runtime is within some constant of $x$.
 $\Omega$, pronounced “big omega” represents lower bounded asymptotic complexity. That is, $\Omega(x)$ indicates that the minimum runtime is within some constant of $x$.
Exercises
\begin{algorithm}
\caption{Binary Search}
\begin{algorithmic}
\REQUIRE A = list of values sorted sequentially
\REQUIRE value = value to search for
\OUTPUT value if it exists, otherwise 1
\FUNCTION{binarySearch}{A, value}
\IF{$\vert A \vert \leq 0$}
\RETURN 1
\ENDIF
\STATE half $\gets \lfloor\frac{\vert A \vert}{2}\rfloor$
\IF{$A_{half} = $ value}
\RETURN $A_{half}$
\ENDIF
\IF{$A_{half} \lt $ value}
\RETURN \CALL{binarySearch}{$A_{half + 1}...A_{\vert A \vert  1}$, value}
\ENDIF
\RETURN \CALL{binarySearch}{$A_0...A_{half1}$, value}
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Specify $O$, $\Omega$, and $\Theta$ for the binary search algorithm above.
HINT: The algorithm is recursive. The best option is to calculate the steps required for one iteration and then multiply that by the number of recursive calls.
Answer (click to expand)
The table below outlines the detailed analysis for
binarySearch
. Because the number of iterations is not yet know, that value is represented by the variable $x$. That is, $1x$ means once per iteration and 1 means once over the lifetime of the algorithm.Line # Lower Upper 2 $1x$ $1x$ 3,7^{*} $1$ $1$ 5 $1x$ $1x$ 6 $1x$ $1x$ 9 $0$ $1x$ 10,12^{**} $0$ $1x$ Total $3x+2$ $5x+1$ ^{*} Return statements terminate the algorithm so they can only be run once.
^{**} One or the other of these lines is run once for every iteration except in the case when the value is found or the input is empty.In the best case, when
value
is located at $A_{half}$ or $\vert A \vert = 0$, the function never recurses making $x=1$. Substituting 1 for $x$ in the lower bound runtime yields $T(n) = 3(1)+2$. The asymptotic complexity is therefore constant ($T(n) = 1)$The worst case, when the value is not present in the input list, is a bit more complex. In every iteration half of the search space is eliminated. Consider what happens with input values of \(A=\{1,2,3,...,100\}\) and value = 0.
 Iteration 1: $n = 100$, compare $A_{50} = 0$, recurse on $A_0…A_{49}$
 Iteration 2: $n = 50$, compare $A_{25} = 0$, recurse on $A_0…A_{24}$
 Iteration 3: $n = 25$, compare $A_{12} = 0$, recurse on $A_0…A_{11}$
 Iteration 4: $n = 12$, compare $A_5 = 0$, recurse on $A_0…A_4$
 Iteration 5: $n = 5$, compare $A_2 = 0$, recurse on $A_0…A_1$
 Iteration 6: $n = 2$, compare $A_1 = 0$, recurse on $A_0$
 Iteration 7: $n = 1$, compare $A_0 = 0$, recurse on $\emptyset$
When $n=100$, 7 iterations are required in the worst case. The size of $n$ is reduced by approximately half on each iteration. What is a number repeatedly divided by 2? The answer is $\log_2$, therefore, $x = \log_2{n}$. Plugging that number into the upper runtime bound from the detailed analysis table give us $T(n) = 5 \log_2{n} + 1$. The final asymptotic analysis is logarithmic ($T(n) = \log_2{n}$).
All that’s left to do is express the runtime using asymptotic notation.
binarySearch
= $O(\log_2{n})$binarySearch
= $\Omega(1)$ $\Theta$ is undefined because the runtime is not tightly bound
 True or False: A runtime of $O(\log_2{n})$ is guaranteed to be faster than
$O(2^n)$ in the worst case for all values of $n$.
Answer (click to expand)
False: Recall that asymptotic complexity ignores constants and lower order terms. This implies that $O(\log_2{n}) \lt O(2^n)$ for sufficiently large value of $n$. For example, consider a runtime of $T(n) = 300 + 75 \log_2{n}$. The runtime is actually higher for values of $n \lt 9$ as shown in the plot below.
As a matter of note, constants this high are exceedingly rare in practice. However, this concept is relevant to the real world especially when dealing with asymptotic complexities of a similar size.
 Without looking, try to list the six general classes of algorithmic
complexity from highest to lowest.
Answer (click to expand)
 Factorial $n!$
 Exponential $2^n$
 Quadratic $n^2$
 Linear $n$
 Logarithmic $\log_2{n}$
 Constant $1$

While Big O notation is a large part of asymptotic analysis, the term does not encompass the subject in its entirely. ↩

Do not be alarmed if that sentence seems like gibberish. The underlying concept of approximations isn’t a prerequisite for this material. If you really want more information, try the book linked below. However, it’s a considerable diversion from the topic at hand.
https://www.amazon.com/dp/3642856454/ref=cm_sw_r_tw_dp_U_x_17vREbW8AQ2BN. ↩

It’s also common to apply asymptotic analysis to memory consumption. This is typical in embedded and edge computing applications; however, it’s far less practical in standard development given the proliferation of cheap hardware. Although infrequent, it’s used to describe human resources, communication of data, randomness in cryptography, load balancer collisions, data integrity errors, labeled training data, machine learning explore/exploit tradeoffs, etc… ↩ ↩^{2}

Analytic Number Theory available on Amazon: https://www.amazon.com/dp/1168480426/ref=cm_sw_r_tw_dp_U_x_JyfREbRBTFQVT ↩

Available on Amazon: https://www.amazon.com/dp/0821826506/ref=cm_sw_r_tw_dp_U_x_dOfREbP5Z00G3` ↩

Some Problem of Diophantine Approximation, available online at https://projecteuclid.org/download/pdf_1/euclid.acta/1485887376 ↩

Big Omicron and big Omega and big Theta, available online at https://dl.acm.org/doi/10.1145/1008328.1008329 ↩

Assuming a generic single processor, Random Access Machine ↩

It’s unfortunate that “Random Access Machine” and “Random Access Memory” have the same initials. However, it’s a reality that isn’t likely to change anytime soon. ↩

Word size is the maximum number of bits that a CPU can process with a single instruction. ↩

From the Commentary section of Volume 45, Number 6 of Notices of the American Mathematical Society. It is entitled Teach Calculus with Big O. Available online at http://www.ams.org/notices/199806/commentary.pdf ↩

A helpful analogy is Plato’s theory of forms. The “shape” is the timeless, unchanging, absolute, ideal of the function that it is representing. The function itself can exist in many different incarnations, but it’s a still a representation of the “ideal form”. ↩

It’s common to use Greek letters for arbitrary variables in mathematical writing. In this context, $\alpha$ and $\sigma$ refer to algorithms. The statement, “a linear algorithm $\sigma$ and a quadratic algorithm $\alpha$” initializes the variables. Within the context of the paragraph, the symbol $\sigma$ denotes a linear algorithm and $\alpha$ denotes a quadratic algorithm. The symbols could be defined to mean absolutely anything. Think of the practice as a sort of shorthand that makes writing more concise. ↩

Most academic sources cover a few more symbols such as $\omega$ (little omega), $o$ (little o), etc…. These are purposely neglected because they are seldom used in practice. Motivated individuals will find proper treatment in any algorithms text book. ↩

A pathological case for an algorithm is the worst possible input it may have to work on, one that is so badly arranged that it produces the most amount of work. ↩

Professors love to use some form of this pedantic distinction for test questions. Those of us who live in the real world are seldom impressed by such academic details. ↩

Using the Gauss technique, this calculation is reduced to $\frac{n}{2}(1 + n)$. However, the summation seems more clear in this context. ↩