When we talk about cybersecurity, the terms DES, RSA, AES and so on come to mind. In fact, these are all common encryption algorithms used in today's digital world. However, there exists many other techniques in the history of cryptography. Dating back from World War II, ENIGMA was an encryption device used for protected communication purposes. Going a bit further back in time, the Babington Plot and the Vigenère Cipher were ciphers used in the 15th and 16th century. Another classic example is the Caesar Cipher from the Roman Age. Although the majority of these techniques have become obsolete with time, it could still be interesting to understand how they worked and how they were eventually broken. We will explore several classical cryptography concepts and mechanisms, along with some concrete examples to illustrate their applications.
Monoalphabetic ciphers
Monoalphabetic cipher is a type of cipher in which each alphabet character mapping is fixed and unique. For example, if "A" is encrypted as "Z", all letters "A" in the message are encrypted as "Z".
Shift cipher
The shift cipher creates the mapping between the raw and encrypted characters by shifting the alphabet by \(k\) positions. The main idea revolves around the concept of modulo, where we rotate the positions of the alphabet in order to create an encrypted message. The well-known Caesar Cipher utilized a shift of \(k=3\), illustrated by the encryption below:
Essentially, there exists 26 different ways of shifting the original positions of the raw message in such a way that it becomes encrypted. We can represent mathematically the encryption process of the shift cipher as follows:
$$ ENC_{k}(m_1, m_2, ..., m_l) = c_1, c_2, ..., c_l$$ where $$c_i = (m_i + k) \ mod \ 26 $$In python, we could write a simple function to encrypt a plaintext message using the shift method as follows:
# Encrypt message by shifting k positions
def encrypt(self, message, k):
k %= 26
trans = str.maketrans(ALPHABET, ALPHABET[k:] + ALPHABET[:k])
return message.upper().translate(trans)
The decryption process is almost identical, only difference is the direction of \(k\):
# Decrypt message by shifting -k positions
def decrypt(self, message, k):
k %= 26
trans = str.maketrans(ALPHABET, ALPHABET[-k:] + ALPHABET[:-k])
return message.upper().translate(trans)
How do we break the shift cipher? Since there are only 26 possibilities, we can use a brute-force approach. If we had a pen, paper, and enough free time, we could write out all possible shifts of a given encrypted message and manually identify which decrypted message makes the most sense. If we are a bit more clever, we could write some code that would compute all the possible shifts for us to verify. However, is there a way to automate the entire process? For instance, is there a way to program a computer to find the correct decrypted message on its own? The answer to this problem is to leverage relative frequencies of letters to break the cipher.
It is possible to obtain a good representation of the frequencies of each letter in the alphabet by actually counting the occurrences from a source. For instance, there was an analysis where the Concise Oxford dictionary was used to obtain the numbers. Another analysis involved counting letters from plain text containing a large number of words, for which the results are openly available and will be used for our demonstration.
Now that we know the average distribution of each letter in English, we have a way to statistically identify whether a given chain of characters belongs to the language or not. Let \(f_i\) be the frequency of the \(i^{th}\) letter in the alphabet and \( \sum f_{i}^{2} \) be the sum of squares, which is equal to \(0.065\) in English. What this means is that given a text in English that is large enough, computing the sum of squares of each letter's frequency would be close to that number. This is essential and will allow us to automate the process of breaking the code!
Now let \(h_i\) be the frequency of \(i^{th}\) letter in the encrypted text after shifting by \(k\) positions, which can be obtained by simply counting each letter's number of occurrences. Given that the plain text is encoded with the shift method, there must exist a \(k\) for which \(f_i = h_i((i + k) \ mod \ 26)\), as the size of the corpus tends to infinity. Therefore, if we compute for each \(k\) the \( \sum f_{i}h_{i}((i + k) \ mod \ 26)\), the result closest to \(0.065\) would be the correct one. However, we need to keep in mind that the lenght of the text must be long enough.
# Display temporarily unavailable. See Github repository for full code.
Substitution cipher
The substitution cipher consists of mapping each letter of an alphabet to another letter or itself, while maintaining a one-to-one relationship between all characters. For instance, starting with the letter \(A\), it can be substituted for any of the \(26\) letters in the English alphabet. Then for \(B\), we can select any letter except the one previously used for \(A\), which means there are \(25\) options. Basically, the number of possible ways of encrypting a text consist of all possible permutations of the alphabet. More precisely, there are \(26! \approx 4 \cdot 10^{26}\) possible mappings!
Polyalphabetic ciphers
Coming soon.