In this post Ill take a very brief and broad look at some of the core principles and fundamental aspects of cryptography. As a practice it has been around for as long as humanity itself but as a science its history is a bit more recent having been around for a few decades at most. It will come as no surprise that the growth of the Internet, or perhaps more accurately, of Data Communications, which predates the Internet by a decade or three, has been one of the main drivers of the growth in the need for Cryptography. The prevalence and penetration of the Internet has now reached such high levels that there is barely a business in the developed world that does not rely nowadays upon effective encryption for its survival. So, this post is about security and the role played by cryptographic technology in data security. Its a bare introduction but I hope to add links to some blog posts which examine some of the more important areas more deeply so look out for them. The ability to secure data while it is in storage or in transit from an unauthorised compromising access is a critical function of information technology now. Indeed all forms of e-commerce such as credit card processing, equities trading or general banking data processing would, if compromised, lead to losses for the unfortunate organisations of billions of dollars/pounds/whatever not to mention the devastating cost in destruction of confidence going forward.
So lets look at a few high level topics to get going.
The fundamentals of information theory were famously defined by none other than the father of the information age, Claude Shannon in his seminal work from 1948, A Mathematical Theory of Communication. In this paper he defines the problem to which information theory purports to be the solution and makes many mathematical theories besides. More importantly, his work acted as a foundation upon which, time after time, the foremost minds of our science have expanded giving us the mature and usable communications mathematics we have today. Shannon’s paper did not come a moment too soon: this was an age in which the atomic bomb had just been developed, we were still a decade from the polio vaccine and Sputnik, and most pertinent, transistors were only just beginning to replace vacuum tubes in the design of digital machines (in fact, this is also thanks to Shannon). Information theory gave engineers, mathematicians, and scientists the necessary tools to analyse how well their machines were transmitting data to and from one another.
Figure 1 above, details Shannon’s communication model. The information source produces a message. The transmitter creates the signal to be transmitted through a channel; the channel is the medium carrying the signal. The receiver accepts the signal from the channel and transfers the signal back to the original message. The destination is the intended recipient of the message. The noise source introduces errors to the signal during transmission; it interferes with the signal, therefore distorting the transmission signal and impacting the transmitted message. Cryptography uses the same model. Shannon also talked in his paper about two other basic concepts: confusion and diffusion. We’ll look at these in more detail below.
In his paper, Shannon also explained that the statistical frequencies of repetition in the cipher-text (encrypted text) and the key (usually a string of numbers or letters that are stored in a file, which, when processed through a cryptographic algorithm, can encode or decode cryptographic data) should be as low as possible. In other words they should appear to be as random as possible exhibiting no discernable patterns. This is what he meant as confusion. Looking at the key again, if a small part of the key is changed, this should have a widespread knock on effect in changing the cipher-text throughout its scope. This is what he meant as diffusion. Without sufficient confusion and diffusion, it is possible to deduce the key from analysing the original plaintext beside the corresponding cipher-text.
Entropy is defined in the dictionary as a lack of order or predictability or as a gradual decline into disorder. In cryptography it represents the amount of randomness in a transmission. Any cryptographic algorithm should produce a cipher-text output which has as much entropy as possible in order to obfuscate the original plaintext from anybody who might examine the corresponding cipher-text. Plaintext transmissions contain order hat can be discerned by the observer even if one does not understand the language being used and it is this order that leads to an ultimate understanding whether it is a language to be decoded or an encrypted stream of text. Thus, discernable order is the enemy of encryption and therefore entropy is to be encouraged.
Random number generation
Randomness is essential for effective encryption and you may be surprised to learn that a computer cannot easily generate randomness. Everything a computer does is under the instruction of an underlying process or algorithm and it is for this reason that it is a significant challenge. This may come as no surprise to the lay reader since a computer may seem to be the ultimate expression of determinism however it is challenging to consider how one might mechanistically program a function or algorithm for generating randomness. In specific relevance to this post, cryptographic processes and algorithms require randomness to be secure as random numbers are required for key distribution, session key generation, generating keys for cryptographic algorithms, the generation of bit streams and initialisation vectors (IVs).
For computers to generate random numbers requires them to capitalise on sources of randomness and unpredictability. Shannon posited that to achieve randomness, two important components are required: a uniform distribution, and independence. In a uniform distribution, the occurrence of zeros and ones is equal. Independence means that no bit can be inferred from the others. Unpredictability is required for random number generation: each number is statistically independent of the other numbers in the sequence. In a computer, random numbers can be generated. Such random numbers are either true random numbers or pseudo random numbers. Thus, randomness can be produced by a true random number generator (TRNG), also known as a random number generator (RNG), or a pseudorandom number generator (PRNG).
As stated previously, deterministic computers cannot generate random numbers. That is to say they cannot do it without external assistance. A TRNG must therefore use an external ancillary non-deterministic source of entropy and some form of function designed to take the randomness provided externally as an argument such that a suitable random number is generated (entropy distillation process). The input source is typically a source of entropy from the physical environment, such as keystroke timing patterns, disk electrical activity or mouse movements; this source is combined with the processing function to generate the required random output in the form required.
In order to understand the concept of freshness properly, it is illuminating to first examine that of a replay attack. In a replay attack, an attacker records the transaction traffic on the network involved in logging in to a system then uses the recording played back, effectively resending what had been sent before, to gain access for themselves, even though the recorded traffic they resend may have been hashed or obfuscated.
The notion of freshness is something of an abstraction which tends to make its understanding less than intuitive but it effectively means that we ensure, by ensuring freshness, that each time we transact with a system in order to access it, the interaction traffic will never be the same as it was before. Now, in a world where we don’t want to change our passwords every time we log off, that seems like a challenge but its not too much of a challenge.
The way it is achieved is for the server to generate a one-time pseudorandom number which is sent to the client at the start of the authentication exchange. This number which will only ever be used once (abbreviated to nonce) is typically concatenated to the password before transmission to the server such that it becomes impossible to anticipate the required transmission in order to successfully complete a replay attack.
One-time pad (OTP)
One-time pads are a mainly theoretic encryption device which in theory provide the strongest possible cipher. This carries some caveats however in that the key must be provided and used properly within a strict set of rules. The theory behind the one-time pad is that the key must be at least the same length as the plaintext message and that the key must be truly random. The key and the plaintext are then combined using the most fundamental of encryption devices, the modulo 2 adder, otherwise known as an exclusive OR gate. The result, given a secure key which has not been compromised is cipher-text which has no direct relation to the original plaintext. To decrypt, the same key is used and the operation reversed. For this to actually be completely secure in theory, the following rules must be observed without exception:
- The OTP (key) MUST be truly random
- The OTP must be at least as long as or longer than the plaintext original
- Only two copies of the OTP should exist
- The OTP must be used only once
- Both copies of the OTP must be destroyed immediately after use
The OTP process is only absolutely safe if and only if the preceding rules are strictly obeyed. Before computers this was a time-consuming and error prone task which rendered it all but impractical unless carried out by machines such as dedicated telegraphy encryption/decryption devices however nowadays it could conceivably be automated by a computer.
Surprisingly perhaps, manual OTP ciphers are still being used today for sending secret messages to agents (spies) via what are known as numbers stations, or one-way voice links (OWVL) both of which typically use HF transmission and can routinely be heard on short wave(HF) radio bands.
When it comes to encryption, one of the most attractive properties of cryptography is known as the avalanche effect, in which two different keys generate very different cipher text for the same input plaintext. As previously discussed, this makes two similar keys that generate different cipher text a source of confusion (Remember Shannon defined this as desirable). We can therefore measure or compare the efficacy of two different encryption algorithms with reference to the avalanche effect they bring about. Plaintext and encryption key are mapped in binary code before encryption process. Avalanche effect is calculated by changing one bit in the plaintext source whilst keeping the key constant and again differently, by changing one bit in the encryption key whilst keeping the plaintext constant. Empirical results show us that the most secure algorithms carry the most significantly high avalanche effect.
Auguste Kerckhoffs was a linguist and military cryptographer in the late 19th century who had many essays on contemporary cryptography published at the time. A quote attributed to him from 1883 states
‘Military cipher systems should not require secrecy, and it should not be a problem if they fall into enemy hands; a cryptographic system should be secure even if everything about the system, except the key, is public knowledge’Auguste Kerckhoffs (1883)
In other words, the key must be kept secret, not the algorithm.
The XOR function has a unique place in cryptography and you may well ask why this function? Why not the AND gate or the OR gate. The answer is strikingly simple. In a nutshell the XOR operation is reversible. So, if a string of binary data were to be passed through the XOR function sequentially along with a key then the output, if passed as an input to another XOR function along with the same key will produce the original output.
The XOR is a binary operation described as a logic gate in a way that facilitates analysis in karnaugh maps and all of the other associated digital logic canon however, taking it out of the realm of digital logic circuits it is simply a way of expressing the act of modulo 2 addition without a carry. We will go on and investigate the utility of XOR/mod2adders in more detail in future posts and I will add links to them below as they are published.
Confidentiality, Integrity and Availability
Confidentiality, Integrity and Availability, collectively known as the CIA triad in the cyber security world and represent the three main elements of a fundamental model of the properties or attributes of a secure system. For example, if we consider a banking application which must certainly be beyond doubt in terms of its security, must exhibit confidentiality (must prevent any unauthorised access), integrity (must be a true reflection of the reality of the bank accounts and safe from unauthorised modification) and availability (be usable whenever it is needed). In cryptography we are not overly concerned with availability as it is not relevant to the field however a secure system will most certainly lose its availability if it were to lose its confidentiality or suffer a compromise of its integrity.
Again, we will look at these premises in more detail in other posts so it is sufficient for now to state them in this context.
Were it not the case that CIA presents such a handy mnemonic, non repudiation would very likely be involved with confidentiality, integrity and availability as the four principles of a secure system. As it is it is now an addendum to the axiom but an essential one none the less.
In a secure system we can think of the non-repudiation of the system as the application of an audit trail provided perhaps by logging. In logging every transaction, interaction and modification of a system and more importantly logging the identity of the entity who was responsible for initiating one of these actions, we ensure that the system under scrutiny in able to maintain its status.
By providing mechanisms which ensure non repudiation we provide an important mechanism for ensuring the survivability of a system through compromise and beyond, hopefully restored to a secure state once again.
For readers familiar with the reactive/proactive bowtie model of system assurance, non repudiation sits very firmly in the reactive side but is essential to a survivable robust secure environment.
Data origin and entity authentication
The subject of the authentication of data origin and entities is a complex and detailed one. Essentially, it is the affirmation that the entity believed to be the source of some data in an interaction is who or what it is believed to be. Ultimately, the receiver needs to have confidence that the message has not been intercepted and or modified in transit and this is accomplished by verifying the identity of the source of the message.
So, how do you prevent an attacker from manipulating messages in transit between an transaction source and its destination? The major considerations are as follows:
- The recipient must confirm that messages from a source have not been modified along the way.
- The recipient must confirm that messages from a source are indeed from that source.
Data Origin Authentication is the solution to these problems. The recipient, by verifying that messages have not been tampered with in transit (Data Integrity) and that they originate from the expected sender (Data Authenticity) confirms both considerations bulleted above. Entity authentication assures that entities are currently and actively involved in a communication session. In cryptography, this can be done by using freshness, as discussed earlier.
The three states of data
In distributed computer systems such as computer networks of any size, the data held by, shared and processed by these computational hosts is defined as being in three states. Those states are, rest, transit or use. Ultimately it is the job of and indeed the raison d’etre for our cryptographic algorithms to protect this data and in each of these three states the job of doing that takes on a slightly nuanced form.
This is data that is stored in data storage media such as magnetic disk drives, optical media or tapes. All data at rest requires to be physically stored in some form of device. Where it requires confidentiality and integrity protection it must be encrypted. The discussion of this topic must wrestle with such architectural considerations such as, do we encrypt the whole device or just the files themselves? Even considerations such as the additional overhead cost to the environment of the energy consumption of the large scale encryption of data at rest fall into this discussion.
Data in transit is data on the move. Moving can mean across space from a satellite to the earth, across oceans through an undersea fibre optic cable, across free space on a town centre public WIFI system, across an office building through the corporate network or even across a computer architecture from the data bus to the CPU. As you likely suspect, each of these cases has its own nuanced features but there are also shared common overarching consideration appropriate to all cases. This is an enormous field of activity so we will let it suffice for now that the treatment of data in transmission must be considered carefully and dealt with appropriately.
As you might expect, data in use is data that is currently being processed by a CPU or indeed by an end user as it is displayed on a screen. This classification can sometimes seem a little anomalous and its certainly the most difficult of the three to assure ourselves that said data is protected in all the ways that we wish it to be. Its almost a given that in order to be used data must be decrypted and therefore encryption is of minimal protection to data in this state.
Thank you for reading this far in a post which could perhaps have seemed a little bit like the contents page to a book. The analogy is, I hope, a sound one as this post contains some of the most core elements of the cryptographic landscape. I’ve mentioned here and there that I intend to add links from each of the sections of this document to other resources of relevance that I post and hopefully if enough time has passed since right now while Im writing this and you landing here to read it then that will be evidenced above. Check back again in the future and look at the links here or even the categories and tags for the blog as it is my firm intention to make this a living document.
Its a bloggers cliché but I would really appreciate your comments. Good and bad. Im committed to displaying them all here and, do that, I will but whatever the motivation its cool to get involved in dialogue whether here in the blog or on twitter or elsewhere.