Just Enough Set Theory – Set Operations (Part 2 of 3)

Welcome to the second installment of this three-part series on set theory. The first piece, Set Theory Defined (recently updated with code samples), detailed requisite foundational knowledge. It is highly recommended that readers begin there if they haven’t already.

The first piece in this series introduced sets and exhibited how ES6 arrays are analogous to them. It also depicted how to transform, or map, a set into a related set. This post expands on set theory by probing into set operations.

NOTE: All code samples are written in ES6 and are therefore not likely to execute directly in a browser. The best option is to use Node or transpile the excerpts using either Babel or TypeScript. The working code is available on GitHub along with execution instructions.

Empty Sets

Empty sets are a rather mundane topic, but nonetheless worth mentioning. As the name implies, they are simply sets that have no elements. They are also commonly referred to as null sets. Mathematically, empty sets are represented as either `\emptyset` or `{}`. The concept relates to empty arrays in software.


The term cardinality sounds impressive; however, it’s simply the number of elements in a set. The mathematical representation of a set with three elements is as depicted in Figure One – Cardinality.


In JavaScript, the cardinality of an array is its length. See the code below.

const someSet = [1, 2, 3, 4, 5]; 
const cardinality = someSet.length; 
// cardinality = 5


Subsets are relatively easy to explain, yet have far reaching implications. A subset is a portion of a larger set. For instance, consider the set of all animals (`A` ). The set of all dogs (`D` ) is a subset of the animal set because although every animal is not a dog, every dog is an animal. The mathematical notation for subsets is as follows: `D\subseteqA`. Another way of mathematically expressing the subset relationship is `\forall x(x\inD->x\inA)`. That looks absurd, but the premise is that for any (`\forall`) element (`x`) in `D` , it is implied (`->`) that the element (`x`) also exists in `A`.


Subsets are often taught with Venn Diagrams. See Figure Three – Venn Diagrams for an example. Admittedly, this account of subsets is a bit prosaic. However, the final post in this series relies heavily on the concept so it bears belaboring the point.


ES6 has a built-in filter method on the array object that enables easy access to subsets. Filter takes a predicate as an argument. Recall from the first article that a predicate is a function that takes a single argument and returns a Boolean response. The filter method applies the predicate to each item in a set and creates a new set that includes the items where the predicate returned true. See the code below.

const animals = [
    {name: "Tom", type: "Cat"},
    {name: "Jerry", type: "Mouse"},
    {name: "Pluto", type: "Dog"},
    {name: "Scooby Doo", type: "Dog"}];

const dogs = animals.filter(a => a.type == "Dog");
// dogs = [{name: "Pluto", type: "Dog"}, {name: "Scooby Doo", type: "Dog"}]


The term summation is a bit misleading because it implies simply adding elements together, however it’s a more powerful concept. Summation applies a function to each element of a set reducing it to a single value. `‎\sum_{x \in S}f(x)‎‎` is the mathematical notation representing the algorithm where `S` can be any set and `f(x)` can be any function. Consider Figure Four – Summation. Given the set `A`, each element in the set is multiplied by two and added together.


ES6’s reduce method of the array object is comparable to summation. Aptly named, reduce applies a function to each member of a set reducing it to a single value. It accepts two arguments: a function and an optional starting value. The function accepts an accumulated value and the current item. The state of the accumulated value after all items are processed is the final return value. The code below is the same process detailed in Figure Four – Summation.

const someSet = [1, 2, 3];
const sum = someSet.reduce((acc, x) => acc + x * 2, 0);
// sum = 12

Reduce is useful for many operations beyond mathematical functions. The code below utilizes it to extract email addresses from a set of users.

const users = [
    {id: 1, email: "email@email.com"},
    {id: 2, email: "email2@email2.com"},
    {id: 3, email: "email3@email.com"}];

const emails = users.map(u => u.email).reduce((acc, x) => `${acc};${x}`);
// emails = "email@email.com;email2@email2.com;email3@email.com"

This above doesn’t do the reduce method proper justice because its efficacy is virtually endless. There are many more options that are outside the scope of this feature. The reader is highly encouraged to find more information on Mozilla’s excellent JavaScript reference.

Power Set

Power sets are something every programmer has to deal with at some point in his/her career, even if they can’t formally identify them by name. In mathematical parlance, power sets are denoted as `P(A)`. A power set is the set of all subsets including the empty set and itself: more succinctly, all possible set combinations. A power set always contains `2^n` elements where `n` is the cardinality of the original set (`|P(A)|=2^(|A|)`).

Power sets are difficult to conceptualize without an example. Figure Five – Power Set depicts a set with three elements. The power set is all possible combinations of the three elements. The result is a set with a cardinality of eight (`2^3`).


Unfortunately, there isn’t an innate JavaScript method for creating power sets. However, that’s an easy problem to overcome given some ingenuity. See the code below.

const someSet = [0, 1, 2];
const powerSet = someSet.reduce((acc, x) => [...acc, ...acc.map(y => [x, ...y])], [[]]);
// powerSet = [[], [0], [1], [1,0], [2], [2,0], [2,1], [2,1,0]]

The code above is a bit intimidating at first glance so it merits additional explanation. The power set always contains an empty set, so the second argument to the reduce method is a set that contains nothing but that. This is the starting value. When the function acts on the first item in the set, the value of acc is [[]] and the value of x is 0. The result of concatenating the current item to each item in acc is concatenated on to the value of acc making it [[], [0]]. The same algorithm is applied to each item in the set. This is difficult to envisage, so the code below details essentially what happens upon invocation.

const ps = (acc, x) => [...acc, ...acc.map(y => [x, ...y])]; 

// First element
let acc = ps([[]], 0);
// acc = [[], [0]]

// Second element
acc = ps(acc, 1);
// acc = [[], [0], [1], [1,0]]

// Third element
acc = ps(acc, 2);
// acc = [[], [0], [1], [1, 0], [2], [2, 0], [2, 1], [2, 1, 0]]

The reader is highly encouraged to review this section multiple times until the concept solidifies.


The post outlined a few useful set operations. ES6 uses the reduce method to apply the concept of summation to sets. A power set is a set of all possible set combinations. Although there is no built in ES6 functionality for this, it’s an easy algorithm to create. Make sure to come back for the final post entitled When Sets Collide. It is by far the most useful in the series covering set operations that act on multiple individual sets.

Just Enough Set Theory – Set Theory Defined (Part 1 of 3)

Set theory is incredibly intuitive and has many practical applications in software engineering. In f

Set theory is incredibly intuitive and has many practical applications in software engineering. In fact, any professional programmer without an understanding is at a disadvantage. Unfortunately, many in the industry relegate it to the purview of mathematicians. This is understandable because most material on the subject delineates set theory with first order logic as a basis for math. The good news is that it doesn’t have to be this way. As this series demonstrates, it is accessible to anyone regardless of background.

The three articles in this series aim to introduce set theory, expound upon set operations, and demonstrate the learning using JavaScript (ES6). The goal is to provide the reader with actionable knowledge to improve his/her software skills without a surfeit of superfluous details. This first installment describes the theory in order to provide a firm foundation for future practical application.

NOTE: All code samples are written in ES6 and are therefore not likely to execute directly in a browser. The best option is to use Node or transpile the excerpts using either Babel or TypeScript. The working code is available on GitHub along with execution instructions.

What is Set Theory

The inception of set theory dates back to the nineteenth century with Georg Cantor. On the surface, it’s brilliantly simple. A set is simply a collection of unordered objects. In mathematical parlance, objects contained in a set are known as members or elements. An element can be literally anything, including another set. Sets are typically depicted as objects inside curly braces and are denoted by capital letters. For instance, `A= { 1,2,3 }` is the mathematical representation of the set `A` with the members `1`, `2`, and `3`. Set membership is signified as: `1\inA`. Figure One – Sets illustrates these symbols.


Set theory relies on FOPL (First Order Predicate Logic) to construct sets. Expanding on the definition above, sets are a collection of objects that satisfy a predicate. A predicate is a function that accepts a single argument and returns a Boolean (true or false) value. For instance, the set of all dogs has the predicate `IsDog(n)` In other words, elements of a set share some arbitrary property. FOPL is fascinating, but not particularly relevant to this article. A general acumen of predicates is sufficient for comprehension of this material. A cursory web search for First Order Logic will present sufficient resources for the curious reader.

Set Mapping

There are a few interesting operations that can be performed on sets, most of which are covered in the next installment. However, mapping from one set to another is germane to a foundational understanding of set theory. A set is transformed, or mapped, into another related set via the use of a function.

A mathematical function is analogous to a software function with added constraints. They are similar in that they accept an input and return an output. The difference is that a mathematical function can only accept a single input, must return an output, are determinate, and side effects are impermissible. Sources often refer to functions as relations between sets because they map a member of a set to member of another set. While mathematical functions are relevant to the understanding of set theory, programmers need not be particularly concerned with this concept. The significant notion is that of a function in general, which should be apparent to most software professionals. As an aside, further understanding of mathematical functions is particularly useful for other programming concepts.

Mapping works by applying a function to each member of a set and placing the output into another set. Figure Two – Set Mapping illustrates the concept. This is particularly applicable to programming, so understanding is imperative.


Given the information above, the impetus of the map method of arrays in JavaScript (ES6) is obvious. Arrays are a convenient analog to sets. See the code sample below.

const wholeNumbers = [1, 2, 3];

const evenNumbers = wholeNumbers.map(n => n * 2);
// evenNumbers = [2, 4, 6]

The above isn’t exactly a realistic scenario: generating an array of doubled numbers isn’t auspicious. A more real world use of the map method is to modify complex objects. See the code below.

const people = [{id: 1, name: "Ada Lovelace"}, {id:2, name: "Charles Babbage"}];

const names = people.map(p => p.name);
// names = ["Ada Lovelace", "Charles Babbage"]

Map is exceedingly suitable for many use cases. Understanding set theory elucidates its utility.


As a fair warning, the remainder of this post provides a prospectus of the areas of set theory that aren’t directly applicable to everyday programming activities. Although intriguing, the uninterested reader should feel free to skip to the conclusion.

To Infinity and Beyond

The conception of sets isn’t exactly revolutionary. Kindergarten pedagogy teaches children to categorize objects into sets. It’s simple and intuitive. The innovation is revealed by examining sets of infinite size.

Conceptually, there are two methods for comparing the sizes of sets. The first is to enumerate the members and compare the resulting counts. This is blindingly obvious; however, it has a substantial flaw. It isn’t possible to calculate the number of members in an infinite set. As a second option, Cantor postulated that if it is possible to create a function that maps the first set to the second set without skipping members, then the sets must be of equal size.

The canonical example is to compare the set of natural numbers (whole numbers excluding zero) to the set of even natural numbers. Figure Three – Counting Sets demonstrates the concept. Although it’s not exactly intuitive, and is often controversial, this establishes that the two infinite sets are equally sized. This might lead one to believe infinity is simply infinity. However, it’s a bit more abstruse.


Consider the set of real numbers (natural, rational, irrational, and transcendental) between one and two. Think back to the number lines that are an inexorable part of preparatory education and envision a set encompassing all numbers on the line between one and two. Regardless of the placement of two distinct points on the line, it is possible to find a smaller number between them. The interesting thing about this infinite set is that it is not possible to create a function that maps the set of natural numbers to this set without skipping members. This implies that although both sets are infinite, the set of real numbers between one and two is actually larger than the set of all natural numbers. Cantor verified this in a beautifully elegant proof known as Cantors Diagonalization.

While theoretically straightforward, the notion of multiple sizes of infinity is a bit vexatious. John von Neumann once said, “in mathematics you don't understand things. You just get used to them.". This concept holds true to his conjecture. The good news is that the notion of different sizes of infinity is only applicable in the most esoteric areas of computer science. The majority of programmers need not concern themselves with it.

Don’t be Naïve

Set theory took the mathematical world by storm with its simplicity and elegance. Many foundational theories are built on the cornerstone of set theory. However, it contains a substantial flaw which could have spelled doom except that mathematicians couldn’t deny its utility. Therefore, it split into two separate theories known as naïve and axiomatic set theory. It’s similar to how general and special relativity exist simultaneously.

Naïve set theory is sufficient for many applications. In fact, it is adequate for almost all software engineering use cases. Axiomatic set theory does apply to some esoteric areas of computability and logic. However, it is far removed from the greatest majority of programming tasks.

As for axiomatic set theory, it is an extension of the original theory that introduces several axioms that address flaws. The underlying issue with naïve set theory is that a paradox can arise when defining predicates. The most popular demonstration of the defect is Russell’s Paradox. Succinctly stated: does the set of all sets that do not include themselves include itself? If the answer is yes, then the definition is contradictory because it does contain itself. If the answer is no, then the predicate is likewise inconsistent because it cannot contain all sets that do not contain themselves. Don’t worry if this seems perplexing, it often requires reflection.

The finer points of axiomatic set theory are beyond the scope of this article. However, the intrigued reader should perform a web search for Zermelo–Fraenkel set theory to learn more. Regardless of its applicability to programming, it’s quite captivating.


The most pertinent programming related concepts detailed in this post are sets and set mapping. A set is simply a collection of objects. Set mapping is applying a function to each member of a set to produce a related set. The following pieces in this series expound on how these concepts are applicable.

Set theory is surprisingly simple yet it reveals some mystifying truths such as the fact that there are multiple sizes of infinity. There are essentially two branches of set theory: naïve and axiomatic. Naïve set theory is sufficient for the majority of software engineering applications.

Make sure to come back for the next article. With the foundational concepts out of the way, the post delves into set operations which provide valuable mental models for programmers. These are concepts that will improve your development abilities.

Coding Theory (Part 3 of 3) – Demonstration

Welcome to the final installment of this three-part series on coding theory. If you have not had the opportunity to read the first two pieces, it is highly recommended that you do before continuing on. They are available here:

Having covered cogent concepts in previous posts, this article aims to dive into a demonstration which consists of defining a code using a generator matrix and correcting errors using a parity check matrix. The example is a bit contrived and thoroughly simplified for the sake of brevity. However, the intent is not to provide an exhaustive resource; it is to familiarize the reader with coding theory and hopefully entice him/her into further inquiry.

As a fair warning, this post contains a modest amount of high school/first year college level math. An understanding of Boolean algebra (integer arithmetic modulo two) and matrices are a welcomed asset to readers. However, learners less accustomed to these concepts can still follow along and simply have faith that the math works out as advertised. A cursory overview of relevant math concepts is provided where appropriate.

Generator Matrix

A generator matrix is a simple, yet particularly clever means of generating codes. They are comprised of an identity matrix combined with an arbitrary matrix. Multiplying a message in row matrix form by a generator matrix produces a codeword. This is difficult concept to grasp without an example. Therefore, the remainder of this section is step-by-step instructions for creating a generator matrix that will produce a code with eight codewords.

The first step is to define an identity matrix which is a matrix that any given matrix can be multiplied by without changing the value of the given matrix. This is accomplished by setting the principal diagonal elements to one and leaving the rest as zero. See figure one for an example. The matrix is of order three because a three-digit binary string can represent eight possible values which is the number of desired codewords.


The next step is to define an arbitrary matrix (denoted by `A`). The size of the matrix determines the size of generated codewords. If `m` is the size of the identity matrix, and `n` is the desired length of codewords, then the arbitrary matrix should be of size `m-by-(n-m)`. Six digit codewords suffice for the purposes of this article; therefore, the arbitrary matrix must be sized three by three (six-digit length minus three-digit identity). Figure two is `A` as used by the remaining examples.


The only thing left do is combine the two matrices above together to form `G`. It’s as simple as placing them side by side as shown in figure three. 3-figure3

With the generator matrix (`G`) in hand, generating codewords is trivial. Multiplying any three-digit binary message in row matrix form produces a codeword. For example, the message `011` becomes the codeword `011110` as shown in figure four. Notice the codeword is the original message with three parity bits appended. This happens because the generator matrix begins with an identity matrix.


Examining the Code

The example code (`C`) is comprised of every number between `000` and `111` multiplied by the generator matrix as shown in figure 5. The example code has a couple of notable attributes. The first is that the sum of any two codewords is yet another codeword. This is known as a linear code. Another extraordinary characteristic is that the minimum hamming distance of the code is equal to the minimum weight of the nonzero codewords. Weight is the number of ones within a codeword. The reasons for this are beyond the scope of this post; it is mentioned to seduce the reader into continued exploration. Examining the code reveals that the minimum hamming distance is three (`d(C) = 3`).


With the code in hand, it’s possible to calculate the equations outlined in part two of this series. First, it’s pertinent to know how many errors the code is capable of detecting and correcting. The previous paragraph defines the minimum hamming distance as three. Figure six demonstrates that the example code is capable of detecting a maximum of two errors and correcting a maximum of one.


Another relevant equation introduced in the second installment of this series is the Hamming bound. Recall that the `|C|` denotes the upper bound number of codewords, `n` is the length of the codewords, and `k` is the maximum number of errors the code is capable of correcting. Figure seven demonstrates plugging these variables into the Hamming bound equation.


The remainder of this post deals with detecting and correcting errors after transmission. Parity check matrices, described in the next section, are a counterpart to generator matrices that facilitate error detection and correction.

Parity Check Matrices

Parity check matrices are derived from generator matrices. They are used during the decoding process to expose and correct errors introduced during transmission. Multiplying a parity check matrix by the transpose of a codeword exposes errors. The concept is best elucidated by demonstration.

A parity check matrix (denoted as `H`) is comprised of the transpose of the arbitrary matrix combined with the identity matrix. As a refresher, the transpose of a matrix is simply the matrix flipped across it’s diagonal so that the (`i`,`j`)th element in the matrix becomes the (`j`,`i`)th element. Figure 8 shows the parity check matrix that corresponds to the generator matrix from the running example.


Multiplying the transpose of any valid codeword by the parity check matrix produces a zero-value result as demonstrated in figure nine. The mathematical rational for this this is beyond the scope of this post. However, it is a worthwhile endeavor for the reader to research this further.


Changing any of the bits in the codeword produces a non-zero result which indicates an error. Consider `011010`, as shown in figure ten. The result does not equal zero so at least one of the bits is erroneous.


After identifying an inaccurate codeword, it may be possible to correct it using `H`. Continuing with the example above; the product of the codeword and `H` is equal to the forth column in `H`. This indicates an error in the fourth bit and changing the fourth bit produces the correct codeword. See figure eleven for an illustration.


Because the example code is only capable of correcting a single error, changing more than one bit generates an irrecoverable codeword. However, with a more complex code, it is possible to correct multiple errors using the distinct sum of `H` rows and the nearest neighbor method. Again, the reader is encouraged to expand on this with more research.


This concludes the three-part series on coding theory. Coding theory is a fascinating field that enables the reliable transfer of information in spite of the shortcomings inherent in computing machinery. Richard Hamming, a pioneer in the field, devised ingenious codes that allow a maximum amount of data recovery using a minimum amount of redundancy. His codes are still widely used and have many practical applications. This post demonstrated Hamming’s methods by providing step-by-step instruction for generating codewords using a generator matrix. Additionally, it illustrated how to derive a parity check matrix from the generator matrix and use it to correct errors.

Thank you for taking the time to read this series of articles. As always, I’m happy to answer any questions or embellish details in future posts upon request. I hope this series has enthused the reader into more acute exploration.

Coding Theory (Part 2 of 3) – Perfect Error Correction


Welcome to the second installment of this three-part series on coding theory. If you have not had the opportunity to read the first piece, it is highly recommended that you do before continuing on. It is available here: http://hideoushumpbackfreak.com/post/2016/07/30/Coding-Theory-(Part-1-of-3)-Coding-Theory-Defined

It’s rare to find concepts simple yet adroit at the same time. However, Hamming’s contributions to coding theory “fits the bill”. This post begins with a brief introduction to Hamming and a short history lesson before diving into Hamming Distance, and Perfect Codes. Additionally, it delves into a few simple math concepts requisite for understanding the final post. These concepts all come together in the final installment by providing examples of how to generate and decode the most powerful and efficient error correcting codes in use today.

Richard Hamming

Richard Hamming was an American mathematician that lived from 1915 thru 1998. Early in his career, he programmed IBM calculating machines for the infamous Manhattan project. Concerned about the pernicious effect he may be having on humanity, he abandoned the Manhattan project to work for Bell Laboratories in 1946. Hamming’s tenure at Bell Laboratories was illustrious. His contributions during that time include Hamming codes, Hamming matrix, Hamming window, Hamming numbers, Hamming bound, and Hamming distance. The impact of these discoveries had irrevocable implications on the fields of computer science and telecommunications. After leaving Bell Laboratories in 1976, Hamming went into academia until his death in 1998.

The Inception of Error Correcting Codes

The world of computation was very different back in 1947. At that time, producing modest (by today’s standards) calculations could take days. Just like today, machines of yore operated on bit strings with parity bits to ensure data fidelity. However, upon detecting erroneous data, the machines had no choice but to halt computation and return an error result. Imagine the frustration of being 47 hours into a 48-hour program and having it error out due to an anomaly introduced by noise. This is the dilemma Richard Hamming faced.

In 1950, Hamming published a paper that would serve as the basis for modern coding theory. He postulated that it was possible to not only detect, but correct errors in bit strings by calculating the number of bits disparate between valid codes and the erroneous code. This came to be known as Hamming Distance.

Hamming Distance

The Hamming distance between two codewords is simply the number of bits that are disparate between two bit strings as demonstrated in figure one. Typically, hamming distance is denoted by the function `d(x,y)` where `x` and `y` are codewords. This concept seems incredibly mundane on the surface, but it’s the inception of a whole new paradigm in error correcting codes; specifically, Nearest Neighbor error correction.

2-figure1Nearest neighbor error correction involves first defining codewords, typically denoted as `C`, that are known to both the source and sink. Any received codeword not contained in `C` is obviously the result of noise. Upon identifying an erroneous codeword, nearest neighbor decoding calculates the Hamming distance between it and every codeword contained in `C`. The codeword with the smallest Hamming distance has a high probability of being correct. See figure two.

2-figure2The quality of error correction is heavily dependent on choosing efficient codewords. `d(C)` denotes Minimum Hamming Distance: that is the smallest hamming distance between any two code words contained within `C`. If a code has a minimum hamming distance of one (`d(C)=1`) then nearest neighbor error correction is futile. If it has a large hamming distance, such as 10 (`d(C)=10` ), then error correction is powerful.

Hamming represented the relationship between minimum hamming distance and the quality of error correction with two concise equations. A particular code can detect a maximum `k` errors in a codeword if `d(C) <= k + 1` and correct a maximum of `k` errors if `d(C) >= 2k + 1`. For example, a code with `d(C) = 10` can detect a maximum of nine errors and correct a maximum of four as demonstrated in figure 3.


An important fact to note is that the equations above represent the maximum bounds of error detection and correction. It is possible to create a code with a minimum hamming distance that falls short of these bounds. In reality, it’s difficult to create a code that effectuates the bounds. There are special codes, known as Perfect Codes, that meet this criterion as well as demonstrate some other desirable traits.

Perfect Codes

Generating an efficient code is a formidable task because it involves three competing principals as shown in figure four. First, short codewords reduce the size of data transmissions. Likewise, as shown in the previous section, the greater the minimum Hamming distance, the greater the codes ability to detect and correct errors. However, there are a limited number of codewords of a specified length that also have a specified minimum Hamming distance.


The Hamming Bound equation demonstrates these competing principals concisely. The equation is shown in figure five, where `|C|` is the upper bound number of codewords, `n` is the length of the codewords, and `k` is the maximum number of errors it is capable of correcting. Any code that achieves the upper bound of the equation is known as a Perfect Code. As a side note, Richard Hamming developed a perfect code known now as Hamming Codes.



This concludes the second installment of this three-part series on coding theory. Richard Hamming created error correcting codes that addressed the problem of brittle computations in the 1950s. However, they still permeate modern computer science. The concept of Hamming Distance incepted Nearest Neighbor error correction. The quality of error correction is dependent on the Hamming Bound, which is an equation that expresses the three competing goals of an effective code.

Make sure to check back for the final installment of this series. To date, the posts have covered mostly supporting concepts. However, the concluding piece agglomerates all ideas into a cohesive whole with an example. As always, thank you for reading and feel free to contact me with questions or comments.

Coding Theory (Part 1 of 3) – Coding Theory Defined

Coding theory stands as a cornerstone for most of computer science. However, many programmers today

Coding theory stands as a cornerstone for most of computer science. However, many programmers today have a diminutive understanding of the field at best. This three-part series of blog posts describes what coding theory is and delves into Richard Hamming’s contributions. Although derived in the 1950s, Hamming’s ideas are so visionary that they still permeate modern coding applications. If a person truly comprehends Hamming’s work, they can fully appreciate coding theory and its significance to computer science.

This first installment of the series defines coding theory, error detecting codes, and error correcting codes. These are all important supporting concepts required to fully appreciate future articles. Although this is aimed at the novice, it will provide a good review for the more seasoned computer scientist.

Coding Theory Defined

Computer systems store information as a series of bits. Coding theory is the study of encoding, transmitting, and decoding said information in a reliable manner. More succinctly: moving bits with fidelity. This appears elementary from the cursory view. What’s difficult about transferring ones and zeros across some communications medium? As figure one illustrates, the answer is the noise introduced by the communications channel.


Recall that computer systems store data as a strings of bits. Each bit has two possible values. These values are often represented as 1/0, true/false, on/off, or even high/low. Regardless of the nomenclature used to represent them, they are nothing more than the absence or presence of a voltage from a computer’s perspective. Noise, including everything from electrical interference to a scratched disk surface, can make these values ambiguous to a machine.

As a grossly simplified example, suppose a computer expects either a zero or five-volt signal. A zero-volt signal goes into one side of the channel and distortion causes 2.6 volts to come out the other side. Therein lies the ambiguity. The machine can only interpret the signal as a one and rely on coding techniques to sort it out.

One important point to remember is that coding theory is requisite due to shortcomings in modern computer hardware. If contemporary machines could transmit data reliably, coding theory would be superfluous. It’s not that building such equipment is impossible. The technology to build reliable machines exists. It’s just not practical. Such computers would be slow and exorbitantly expensive. Richard Hamming stated: "We use coding theory to escape the necessity of doing things right because it’s too expensive to do it right" (Source).

Coding theory addresses the inadequacies of machines by building fault tolerance directly into transmitted data in the form of error detecting and error correcting codes.

Error Detecting Codes

Aptly named, error detecting codes enable receivers to determine if accepted information contains errors. This is possible by appending calculated verification data to the data source before transmission. The sender calculates verification data by running the source data through a deterministic algorithm which typically produces either a hash, checksum, or parity bit. Upon receipt, the receiver runs the same algorithm on the information received. If the data produced by the receiver matches the verification data, it’s safe to assume the accepted information is unadulterated. Figure two shows the process more concisely.


The concept of using codes for error detection is actually quite old. Scribes in antiquity would sum the number of words in paragraphs and pages and use those values to detect transcription errors. In that case, the original scroll is the source and the produced scroll is the sink. The scribe himself is the communication channel and source of noise. The algorithm used to generate the verification data is the process of counting the words. Obviously, error detecting is more complex in modern times but the general principal remains unchanged.

Error detection goes beyond simply detecting errors introduced by noise; it can also detect information tampering by malicious third parties. Although fascinating, all of that minutiae is beyond the scope of this post. For brevity, this article explores the simplest type of error detecting codes: parity bits. This is the only error coding concept particularly germane to future installments in this series.

A parity bit (aka check bit) is a verification bit appended to the end of a codeword. The parity bit equals zero if there are an even number of ones and one if there are an odd number. Figure 3 illustrates this concept. As a side note, what is described above is technically an “even” parity bit. The bit value will be the opposite in the case of an odd parity bit. The remainder of this article assumes even parity bits.


Parity bits can only detect an odd number of errors. Consider the seven bit codewords above. The parity bit is only useful if there are one, three, five, or seven errors. If there are two, four, or six error, the parity bit indicates success. One method for mitigating this is by arranging the data into a matrix and generating parity bits in multiple dimensions as shown in figure four.


The example shown above has two dimensions; however, it’s possible to add parity bits in unlimited dimensions. While it’s fairly easy to imagine a matrix with three dimensions, it’s arduous to visualize a matrix with more than that. Regardless, it’s mathematically feasible. A future article in this series examines this in more detail.

Error detecting codes inform the receiver of errors during the transmission of information. Knowing there is an error, the receivers can easily make a request to resend data. Many systems work exactly like this. The next section explores how coding theory takes this one step farther by not only detecting errors, but correcting them as well.

Error Correcting Codes

The previous section describes how receivers request a resend upon detecting errors with error detecting codes. Unfortunately, there are applications where this isn’t an option. For instance, imagine trying to communicate with a satellite in deep space when the transmission process could take months. Another example is data stored on a disk that may degrade over time. It’s impossible to ask for a retransmission from the source because the source itself is corrupted. Yet another example is broadcast systems where there is no backchannel to facilitate resend requests. These are just a few examples. For such cases there are error correcting codes which not only inform the receiver of errors, they contain enough information to fix them.

The simplest form of error correcting codes are repetition codes. As the name implies, the message is simply replicated multiple times. The decoder determines the correct bits by choosing the majority. Figure five illustrates the concept. The amount of duplication is implementation dependent; however, less than thrice is not effective.


There are more elegant and efficient error correction paradigms than repetition codes. However, they are still in use in some modern system due to the ease of implementation. The main take away from this section is simply what error correcting codes are. Future installments examine them in greater detail.


This concludes the first installment of this three-part series on coding theory. This article introduced coding theory, error detecting codes, and error correcting codes. In short, the concepts required to fully appreciate future posts. Future installments dig into details of coding theory and explore the works of Richard Hamming, who revolutionized the field in the 1950s.

Make sure to come back for the next article because that’s when things start to get exciting. The post digs into some fascinating math and the more ingenious methods used for error correction. As always, thank you for reading and feel free to contact me with questions or comments.