Sorting

Typically, formal algorithms courses begin with sorting. This isn’t surprising considering that it is one of the most heavily researched areas of computer science1. Sorting is so ubiquitous that all modern programming languages come equipped with one or more generic sorting primitives. As such, it’s unlikely that programmers will need to implement such algorithms. Studying them, however, is indispensable.

Sorting algorithms are an excellent pedagogical device. Because the concept of sorting is fairly intuitive, the focus shifts to algorithmic techniques. Many students struggle with the concept of asymptotic analysis until they encounter sorting algorithms. Likewise, several advanced concepts such as generic programming, divide and conquer algorithms and probabilistic algorithms are best demonstrated with sorting2. Students can rest assured that they will utilize these skills in other contexts. In short, studying sorting algorithms will endow the student with algorithm profundity.

History

As the saying goes, “necessity is the mother of invention” and the computing industry was founded on the need to quickly sort large amounts of data. In 1880, it took more than eight years to tabulate the United States census results. A Census Bureau employee, Herman Hollerith, created an eponymous tabulating machine for the 1890 census that calculated the results in less than one year. In essence, the machine’s innovation was its ability to quickly sort punch cards. The lineage of all modern computing machines is directly traceable to the Hollerith tabulating machine.

Practically every computing first was associated with sorting: the first data-processing machines, the first stored programs, the first software, the first buffering methods, and the first work on algorithmic analysis. The history of sorting and computing are virtually one.3

Don’t Repeat Yourself (DRY)

One of the first software engineering principles4 that neophytes learn is Don’t Repeat Yourself (DRY). Stated succinctly, it means that every concept should have a single authoritative representation within a system5. Keeping code DRY is a bit of an art form that takes practice. Luckily, sorting is the perfect scholastic device.

Sorting algorithms are kept DRY via two primary mechanisms: generic programming and the strategy pattern. A short introduction to each is presented below.

Generic Programming

Generic Programming is a technique that abstracts away the data type (e.g. string, integer, etc.) from the logic. That is, it allows algorithms to operate with a variable data type by accepting the type as a parameter. For instance, a generic sorting function in TypeScript6 looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
function sort<T>(data: Array<T>) {
    // sort logic here...
}

// sort strings
sort(["string 3", "string 2", "string 1"]);

// sort numbers
sort([3, 1, 2]);

// sort user-defined types
sort([{name: "string 3", value: 3}, {name: "string 1", value: 1}, {name: "string 2", value: 2}]);

Most modern programming languages have generic features. For instance, C++ provides templates, most interpreted object-oriented languages (e.g. Java, C#, and Python) have generics, and most functional languages (e.g. ML, Scala, and Julia) have parametric polymorphism. Loosely typed languages (e.g. JavaScript, Ruby, and Perl) have implicit generics because algorithms accept all types by default. C does not have an explicit generic construct; however, the functionality is easily achievable using void pointers.

A thorough treatment of generics it out of scope. Readers are strongly encouraged to investigate the generic features provided by their language of choice and use the knowledge to create the sorting algorithms outlined in this chapter. Sorting is an ideal generic programming archetypal application.

Strategy Pattern

The strategy pattern7 is a behavioral design pattern that abstracts away a piece of logic. Stated differently, it provides a mechanism to specify a portion of an algorithm at runtime. This entails accepting a piece of executable code8 from the consumer and using it to make a decision. In the case of a sorting algorithm, the strategy pattern makes relative ordering decisions that enable a single sorting implementation to realize any output order (e.g. ascending, descending, etc…)9.

By convention, implementing the strategy pattern for a sorting algorithm entails accepting a comparator, aka priority function. A comparator accepts two objects and returns their relative sort position as an integer. Specifically, a comparator returns a negative value if the first element precedes the second, a positive value if the first element succeeds the second, and zero if the elements are equal. This is a bit difficult to visualize without an example.

Consider the pseudo code below. The comparison responsible for determining the relative sort order of elements is hard coded on line 4; therefore, this algorithm is only capable of sorting in ascending order.

\begin{algorithm}
\caption{Inefficient Sort}
\begin{algorithmic}
\REQUIRE $A$ = array of numbers
\ENSURE $A$ sorted in sequential order
\FUNCTION{inefficientSort}{$A$}
    \FOR{i $\in$ A}
        \FOR{j $\gets$ 0 to $\vert A \vert$ - 1}
            \IF{$A_j \gt A_{j+1}$}
                \STATE swap $A_j$ and $A_{j+1}$
            \ENDIF
        \ENDFOR
    \ENDFOR
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

The pseudo code below employees a sorting strategy. Notice lines 4 and 5; the comparison responsible for determining the sort order is consumer defined by the comparator.

\begin{algorithm}
\caption{Inefficient Sort w/ Strategy Pattern}
\begin{algorithmic}
\REQUIRE $A$ = array of numbers
\REQUIRE comparator = function that determines the relative sort order of items
\ENSURE $A$ sorted in sequential order
\FUNCTION{inefficientSort}{$A$, comparator}
    \FOR{i $\in$ A}
        \FOR{j $\gets$ 0 to $\vert A \vert$ - 1}
            \STATE result $\gets$ \CALL{comparator}{$A_j$, $A_{j+1}$}
            \IF{result $\gt$ 0}
                \STATE swap $A_j$ and $A_{j+1}$
            \ENDIF
        \ENDFOR
    \ENDFOR
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

The pseudo code below demonstrates how to consume a sort algorithm that implements a sorting strategy.

\begin{algorithm}
\caption{Sort Strategies}
\begin{algorithmic}
\REQUIRE x = an integer 
\REQUIRE y = an integer
\OUTPUT a negative value if the x precedes y, a positive value if the x succeeds
y, and zero if the elements are equal.
\FUNCTION{ascending}{x, y}
    \RETURN{x - y}
\ENDFUNCTION
\STATE
\REQUIRE x = an integer 
\REQUIRE y = an integer
\OUTPUT a negative value if the x precedes y, a positive value if the x succeeds
y, and zero if the elements are equal.
\FUNCTION{descending}{x, y}
    \RETURN{y - x}
\ENDFUNCTION
\STATE
\REQUIRE A = array of integers
\FUNCTION{sorting}{A}
    \STATE \CALL{inefficientSort}{A, ascending} \COMMENT{sort array in ascending order}
    \STATE \CALL{inefficientSort}{A, descending}\COMMENT{sort array in descending order}
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

To further elucidate the concept, the TypeScript6 code below demonstrates both generic programming and the strategy pattern.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
function sort<T>(data: Array<T>, comparator: {(x:T, y: T): number}) {
    for (var _ in data) {
        for (var i = 0; i < data.length - 1; i++){
            var result = comparator(data[i], data[i + 1]);
            
            if(result > 0) {
                var temp: T = data[i];
                data[i] = data[i + 1];
                data[i + 1] = temp;
            }
        }
    }
}

var data: Array<number> = [4, 1, 3, 2];

// sort numeric data in ascending order
sort(data, (x:number, y:number) => x - y);

// sort numeric data in descending order
sort(data, (x:number, y:number) => y - x);

// sort string in ascending order
var string_data: Array<string> = ["c string", "b string", "a string", "d string"]
sort(string_data, (x:string, y:string) => x.localeCompare(y));

// sort custom data types
type MyObj = {val: number, name: string};
var obj_data: Array<MyObj> = [{val: 4, name: "four"}, {val: 1, name: "one"}, {val: 3, name: "three"}];
sort(obj_data, (x:MyObj, y:MyObj) => x.val - y.val);

WARNING: This is a particularly inefficient implementation of bubble sort intended for demonstration only.

For the purposes of this chapter, assume each sorting algorithm is intended for general consumption. As such, generic programming and the strategy design pattern are best practices for keeping code DRY. Practicing these techniques promises to pay large dividends for aspiring software engineers as the skills are directly transferable to virtually every development effort.

Source Code

Full Repo

Relevant Directories:

Click here for build and run instructions

  1. This claim in based on nothing other than the author’s general impressions. 

  2. Examples of divide and conquer algorithms are merge sort and quick sort. Implementing quick sort with a random pivot choice is an example of a probabilistic algorithm. 

  3. Paraphrased from Donald Knuth’s The Art of Computer Programming, Volume 3 Sorting and Searching pages 383-389. The full story is wildly interesting and a full read is well worth the reader’s time. 

  4. Although the definitions are far from ubiquitous, there is a sharp distinction between programmers and software engineers. A programmer’s sole concern is transforming requirements into code whereas software engineers take a more holistic approach that encompasses qualities such as maintainability, malleability, and trade-offs. 

  5. Keep in mind that it’s easy to abuse this principle. Just because two pieces of code look similar doesn’t necessarily mean that they represent the same concept. 

  6. The choice of TypeScript is deliberate as its type system is concise and expressive. Readers with no exposure to the language should still be able to gain an adequate understanding by examining the code snippets.

    Use the TypeScript playground to compile and experiment with the code.  2

  7. See the strategy pattern Wikipedia page for a quick introduction 

  8. The means of passing executable code is different depending on the language. Typical solutions include function pointers, lambdas, and objects. 

  9. While not all sorting algorithms actually implement the strategy pattern, well-designed algorithms suited for general consumption should accept them.