Total Pageviews

Monday, 3 December 2012

Public Key Encryption

Public Key Encryption
Article Index
Public Key Encryption
The problem with keys
Cracking the code
Public key encryption is what makes the commercial Internet work and it provides privacy for anyone who wants it. When you don't know how it works it seems more like magic than anything else. When you do know how it works then there are still time and applications of it that seem magical.
Without it we would find it much more difficult to safeguard financial data such as credit card numbers in transit. What is more it all works behind the scenes and almost without the user knowing anything about it. At most you might see a message telling you that you are about to use a secure connection or server.
You can ignore the details if you want to, but don’t!
How it works is not only amazing but only by knowing how it works can you have any hope of evaluating how safe it all is.

Cryptography and codes

Traditionally cryptography has used a key to code a message and a key, usually the same key, to decode the cipher text.
The key is used in a process that transforms the message into the cipher text in such a way that it should be impossible to extract the message from it without the key.
You can think of this process as a sort of mixing of the message data and the key. The mixing has to be thorough enough to hide both the message and the key but it has to be structured enough to be relatively easy to undo if you have the key. How successful all of this is depends very much on how easy it is to detect the key or the original message in the cipher text.
For example, one of the simplest codes adds a constant to the character codes of each of the characters in the message. The constant is the key and the adding mixes the key with the original message data. To see how to do this you simply have to assign each character a number and then add the constant.
The only additional complication is that you have to use modular or “clock” arithmetic so that adding one to 26 takes you back to 1 i.e. z is coded as a in this case. 
In this case the code is very easy to break. As soon as you have identified a single letter you can work out the constant, i.e. the key and subtract it from the rest of the cipher text.

Symmetric Key Encryption

It takes quite a bit of work to come up with a process that hides the text and the key well enough to make it difficult to break but it is possible.
Many workable encryption systems are based on this secret key encryption system. As long as the key is kept safe so is the message and here we have the problem.
How do you get the key from the source to the destination without it being at risk.
For example, there would be little point in transmitting the key to a web server just before sending an encrypted credit card number!
One solution is to pay trustworthy couriers to transport keys from one place to another in locked containers. Believe it or not this actually happens for data that has to be protected at the highest level of security. Once the key is safely with the source and the destination then data can be sent with hardly any fear that it can be cracked.
In principle the message is as secure as the key. If the key falls into the wrong hands then the game is up and the message can be read by the wrong person.
This is the way a one "time pad" works - a key as long as the message is used to encode it. Without the one time pad to decode it then it is very difficult to crack the code.

fig1
Symmetric key cryptography

This sort of encryption is called symmetric key because the same key is used for coding and decoding.
Symmetric key codes are very popular because they are fast to compute and very safe. In fact modern symmetric key codes are so secure that they don’t rely on keeping the encoding process a secret.
That is, even though you know exactly how the coding/decoding is performed, without the key the only way to decode a message is to try every key.
Two well-known symmetric key codes are DES and IDEA and the precise details of both are freely available. Unless someone is keeping it a big secret then the only way known to crack these codes is to try every possible value of the key.
What this means in practice is that security is proportional to key length. The DES code uses a 56-bit key and, given enough money it is conceivable that you could try all 2^55 combinations (i.e. 36,028,797,018,964,000)  in a reasonable time.
Of course when it was invented back in 1975 even a supercomputer of the time would have taken far too long trying every key. Today it is reasonable to assume that government agencies can crack a DES code any time they feel like it.
IDEA, the code in the PGP (Pretty Good Privacy) program, is a suitable replacement for DES and it has a 128-bit key. Even with the computer power available today it is difficult to believe that a 128-bit key can be broken simply by trying every combination. When you try to estimate how long it would take, even with huge numbers of computers on the job, 2^128 is such an enormous number that times greater than the age of the universe are what you usually get!

Public key cryptography

Modern symmetric key cryptography works very well but it suffers from the problem of transporting the keys and keeping the keys secret. Imagine that you want to buy something from a website. If you want to send your credit card number using a symmetric cipher then either you have to send the website your key or it has to send you its.
Forget for a moment the transport problem which could presumably be solved by sending a letter, i.e. using the post, or by paying it a visit in person. If key transport were the only way to make things work even existing institutions like banks would take on the job of securely transporting keys – for a fee of course!
Even if you can transport keys there is still a problem – once the website has your key it can use it to encode information to be sent to you and decode information you send to it but it can also use it to decode any message you send using that key!
The key both locks and unlocks the message and it doesn’t matter whom the message was intended for. What this means is that you would need to set up a different key for every website you wanted to work with.
So many keys are bound to make the system less secure and error prone and it is lucky for us that there is an alternative.
Public key encryption seems so unlikely a process that you have to see it in action to believe that it works! The basic idea is that now we have two keys – a public key and a private key.
The public key can be freely handed out to anyone who wants it and it is used to code a message and send it to the destination.
The magic is that the coding procedure mixes up the message and the public key in such a way that even knowledge of the public key doesn’t help in the task of unmixing.
To unmix the public key and the message you need a second key – the private key – which is, of course, kept secret. Using the private key you can easily undo the mixing and retrieve the message.
Not only does the encryption process hide the original text in a way that cannot be undone by the public key even knowledge of the public key doesn’t give any clue as to what the private key, its unique counterpart, might be.
This is all very mysterious! How exactly does this work?
The whole thing relies on an idea called a "trapdoor" function. This is a function of the data that is very difficult to reverse - in other words the inverse function is much more difficult to compute than the function itself. However in some cases if you provide a tiny bit of extra information then finding the inverse function becomes easy.
For example, if I multiply two primes together (recall that a prime number is one that doesn’t have any factors – so 7 is prime as no integer exactly divides it but 9 isn’t because it is divisible by 3) e.g. 3 times 7 giving 21 then to compute the function all I need is simple multplication.
To perform the inverse operation I have to find the prime factors of 21 and this is more difficult and it gets much more difficult as the size of the numbers involved goes up. Essentially the only direct way of doing it is to try each of the primes in turn until you find one that divides the number exactly.
However if I tell you that one of the primes was 3 all you have to do is divide by three to recover the 7. The problem of prime factoring has a "trap door" that leads you straight to the answer if you know one of the primes involved.
There are a number of public key systems based on exactly this prime factor principel and one of the best known is the RSA cipher so lets take a look at this in detail.

RSA

To create the key you first start out by picking two large prime numbers.

Finding large primes isn’t too difficult but to keep the arithmetic simple let's pick small primes for our example: p=3 and q=11.
The product of 3 and 11 is 33 and this is the first part of our key
n=p*q=33. 
As well as the product of the primes we also use the product of one less than each of them, i.e.
m=(p-1)(q-1)=2*10=20 
as another part of a key.
Next we need another prime, d, in the interval [max(p,q)+1,n-1]. In our case the interval is [12,32] and we can pick d=13 as a suitable prime. That is
d=13 is a prime in the interval  [max(p,q)+1,n-1]==[12,32]
This all sounds complicated but we are almost done. We just need one more number to complete the keys.
We need a value e such that
e*d=1 Mod m
If you need a refresher on the Mod function then see Mod function. 
But put as simply as possible the n Mod m is the remainder when you divide n by m. Another way of think of this is to "clock arithmetic" and allow values to wrap round at m.
In this particular case  we need to find a number e, an integer, that when you multiply by d and allow the result to wrap round at m you get the result 1.
For example, 4*2 Mod 7 = 8 Mod 7 = 1 Mod 7

fig2.Modular arithmetic in action

There is a fairly simple well-known algorithm for working out modular inverses, as e in the formula above is called, so in practice you don't have to guess or spend hours working it out from scratch.
In our case the problem is to find an e such that
e*13 Mod 20 =1 
and you can verify that e=17 gives the required result.
Now we have a lot of numbers and it is worth summarizing what they all are:
  1. From two primes p and q we compute
    n=p*q =33
    and
    m=(p-1)*(q-1)=20
  2. We pick another prime d = 13 and work out e so that
    e*d mod m =1
    i.e. e=17 in this case
The two numbers e and n can be published as the public key (e,n) and d and m are keep secret as the private key (d,m).
If you want to send a coded message all you have to do is use:
C=(M^e) mod n
where M is the value that corresponds to the data C is the encrypted value and M^e means raise M to the power of e. Notice that to encode the value we use both parts of the public key e and n.
For example, the letter “E” is the fifth letter i.e. M=5 and our public key is (17,33) therefore the encrypted character is:
C=5^17 mod 33 = 762939453125 mod 33 =14
So to transmit an encrypted “E” we send a “14”.

Cracking the Code?

This might not seem very secure but consider the problem of someone wanting to crack the code.
They know the public key (e,n) and they know the code word – 14 and to recover the message they have to solve:
C=14=M^17 mod 33
That is, they have to find M such that it gives 14 when raised to the 17 power mod 33. This turns out to be a very difficult problem – unless you happen to know d.
In fact any solution to the problem is equivalent to knowing d. If you do know d then you can write down the solution at once:
M=C^d mod 33 = 14^13 mod 33 =
                     793714773254144 mod 33 =5
which is of course the number we first thought of as a letter “E”.
Now suppose that you don’t know d but want to find it to crack the code.
This should be easy as we all know that
e*d mod m=1 
and this is reasonably easy to solve, but wait, we don’t know m!
If you recall the private key is (d,m) and all we know is (e,n). However surely knowing n we can work out m because n is just p*q and m is just (p-1)*(q-1). So all we have to do is find the prime factors of n, use these to construct m and finally solve to find d and again - all is lost!

Finding factors

This may sound complicated but to gain the information that has been encrypted it is worth it and after all factoring a number is easy – wrong.
Factoring a small number is easy.
For example, in our case we can find the prime factors of 33 by trial and error. All we have to do is test to see if each prime smaller than 33 divides 33. In fact most of us are so good at arithmetic that we immediately notice that 33=3*11 and so p=3 and q=11. From this we can recover the first part of the private key m=(p-1)*(q-1)=20 and from this we can find d as the solution to
e*d mod m = 17*d mod 20 =1
which is very easy.
So the security of the entire scheme depends on how hard it is to factor a number.
To see just how hard, try factoring 53357, which I promise is the product of two primes.
If you manage that try 62773913 and so on.
Public key cryptography generally uses very big primes which makes the task of recovering them very, very difficult. So it all works and it is all secure – unless you know how to factor numbers really fast!
Currently this is still a problem that is difficult although mathematicians are finding short cuts public key cryptography seems to be secure and quantum computing promises to solve the problem in just one weird quantum step - A Quantum Computer Finds Factors
Finally one last complication because public key cryptography is computationally intensive it generally isn't used to encode lots of data. Instead it is used to exchange keys which are then used in symmetric encryption methods. In this sense public key cryptography is the answer to the key exchange problem we mentioned earlier. That is you use public key cryptography to start the transaction and then shift to a good old symmetric key method once the keys have been exchanged.

fig3
Public key encryption is really used to send keys which are then used to send the real data