|
Attacks against the BEST encryption algorithm
Chaos is definitely not randomness
|
Papers
|
13 September 1999
| by
+Spath.
|
|
|
Courtesy of Fravia's page of
reverse engineering
|
|
fra_00xx 980913 +Spath 0110 NA PC
|
Magistral essay by +Spath. Tackling an +HCU challenge.
Definitely a MUST reading for all reversers. +Spath has written a cryptoanalystycal masterpiece.
| Advanced
|
|
There is a crack, a crack in everything
That's how the light gets in
| |
Rating
|
( )Beginner (X)Intermediate (X)Advanced ( )Expert
| |
A cryptanalysis of a bitstream cipher, in answer to a challenge proposed to the
+HCU.
Attacks against the BEST encryption algorithm
Chaos is definitely not randomness
Written by
+Spath.
Some time ago, Jeremy Likness proposed a cryptographic challenge to the +HCU,
to test the strength of his BEST encryption algorithm. This essay is not a
solution to his challenge, but a cryptanalysis of his cipher that proves
that the BEST algorithm is not secure. This paper first describes weaknesses
of the BEST algorithm, particularly a flaw in the chaotic function. These
weaknesses are then exploited for a chosen-plaintext attack. At last, I
show that these flaws can also compromise the security of the encryption
in ciphertext-only attacks. I conclude that the BEST encryption algorithm
is not secure.
None
The BEST (Bitstream Expert Statistical Translation) algorithm described
at http://www.mindspring.com/~jayke/Bitstream.htm.
The challenge was published at http://fravia.org/jeremy1.htm.
An implementation of BEST can be found at http://fravia.org/jn_essay.htm.
None.
ABOUT THE BEST CHALLENGE
THE BEST CIPHER
I) FROM CHAOS TO ORDER
II) THE CHOSEN-PLAINTEXT ATTACK
III) THE FIRST BYTE
IV) ALL FOR ONE
V) A REAL-LIFE ATTACK
ABOUT THE BEST CHALLENGE
A few words before we start : as it is described on Fravia+'s site,
the BEST challenge is absurd. As pointed out on Fravia+'s board by
Ghiribizzo and others, no serious evaluation can be done with only a
few plaintext-ciphertext pairs and without any description of the
algorithm.
With the rules that Jeremy chose, even if nobody breaks his challenge
in 100 years, it does not prove a single thing about the strength of
his algorithm. On the contrary, a cryptanalysis can much more easily
expose weaknesses of the algorithm and describe various possible attacks.
Fortunately, all the informations needed for a cryptanalysis were available
on Jeremy's homepage, including a description of the algorithm and some C
source code.
THE BEST CIPHER
To fully understand this paper, I highly recommend to get a copy
of the original definition of the BEST cipher on Jeremy's homepage.
However, here's a description of the BEST encryption algorithm,
where :
Ix is the bit number x of the input file (plaintext).
Ky is the bit number y of the key file.
Ox is the bit number x of the output file (ciphertext).
f(n) = 3.56999999*n*(1-n), the chaotic function.
maxy is the size of the key file (in bits).
^ is the XOR operator.
~ is the NOT operator (bit inverse).
% is the modulus operator.
Norm is a normalizing function that keep the decimal
part of a number, i.e. between 0 and 1.
A0 = Encryption algorithm
1. n = seeder value
2. r = 3.56999999
3. For each x
4. Ox = Ix ^ Ky
5. Ky = Ox
6. If n > 0.5 then Ox = ~Ox
7. If Ix = 0 then n = n + 0.33333333
8. Else n = n + 0.66666666
9. n = Norm(f(f(n)))
10. If Ox = 1 then
11. y = (y +1) % maxy
12. If y % 8 = 0 then
13. { Ky = Ky ^ Ky-8, . . ., Ky+7 = Ky+7 ^ Ky-1 }
14. End of for loop
The seeder value is computed by looping through the key file for the target
file length. Later, I will use the notation Kx(y) when I refer to bit y of
key byte x (x and y lowest values are 0). I will use the same notation for
the plaintext bytes (I) and the ciphertext bytes (O).
Now let's write the dependency equations of this algorithm :
Ox = F(Ix, Ky, n) (A)
Ky = F(Ox, Ky-8) (B)
n(k+1) = F(Ix, n(k)) (C)
seed = F(K, size(I)) (D)
I think that equations B, C and D show weaknesses of the algorithm, I will
use B and C for my attack.
PART ONE : FROM CHAOS TO ORDER
Let's make a little experiment : fire Hiew and create a plaintext file
filled of '00' bytes ; choose any file on your hard disk as key file
and encrypt the plaintext with BestWin. Now look at the ciphertext :
it's almost completely filled by '00' bytes.
What happened ?
If we submit an 'all 0' input file, we can simplify steps 4 and 7 and
remove steps 5 and 8 from the algorithm, so that we have :
3. For each x
4. Ox = Ky
6. If n > 0.5 then Ox = ~Ox
7. n = n + 0.33333333
9. n = f(f(n))
10. If Ox = 1 then
11. y = (y +1) % maxy
12. If y % 8 = 0 then
13. { Ky = Ky ^ Ky-8, . . ., Ky+7 = Ky+7 ^ Ky-1 }
14. End of for loop
Now if we study Xn+1 = f(f(Xn)), we see that something very strange
happens : whatever initial value you chose, after some time, the
normalized chaotic function get stuck and converges to 2 values.
Look at these succesive values of n after step 7 ; the left column
lists the values for even bits and the right column the values for
odd bits :
even bits odd bits
0.8923637046805643 0.8377088795512472
0.8923637046808309 0.8377088795507856
0.8923637046808787 0.8377088795507024
0.8923637046808873 0.8377088795506879
0.8923637046808889 0.8377088795506854
0.8923637046808890 0.8377088795506848
0.8923637046808892 0.8377088795506848
0.8923637046808892 0.8377088795506848
... ...
To have a good idea of what happens, you can draw f(f(n)) and g(n)=n.
The f(f(n)) curve has a symetry axis at 0.5 (since f(1-n)=f(n)), it has
1 minimum and 2 maxima, one of them being (0.8320, 0.8923), which explains
the convergence. We see also that the second extremum is very close from
the curve of g(n), and that the distance between the two maxima is very
close from 0.33 (modulo 1).
So after enough '00', the chaotic function will just give out two values,
0.8923637046808892 for even bits and 0.8377088795506848 for odd bits (these
values may be switched for other keys).
Both normalized values are greater than 0.5, so we can simplify step 6, and
after enough rounds we will have this algorithm :
3. For each x
4+6. Ox = Ky^1 = ~Ky
7. n = n + 0.33333333
9. n = f(f(n))
10. If Ox = 1 then
11. y = (y +1) % maxy
12. If y % 8 = 0 then
13. { Ky = Ky ^ Ky-8, . . ., Ky+7 = Ky+7 ^ Ky-1 }
14. End of for loop
Now suppose we reach a '1' bit of the key, i.e. Ky = 1, we have therefore
Ox = 0, the y index is not incremented and since n will remain stuck, y
will not change any more : all following bits will be assigned to this ~Ky
null bit and the rest of the output file will be all "00".
This is particularly striking with a 'all 00' file, but this also happens
with shorter patterns. So the first conclusion is : depending on the key,
any long enough "00" pattern in the plaintext will create a "00" pattern
in the ciphertext (or in other words, we can guess sequences of null bytes
of the plaintext).
PART TWO : THE CHOSEN-PLAINTEXT ATTACK
To illustrate this chosen-plaintext attack, I will take as example a
simple 25 bytes long key ; here's its hex dump :
54 65 73 74 6f 6E 73 20 75 6E 20 6E 6F 75 76 65
6C 6C 65 20 66 6F 69 73 20
The plaintext that will be encrypted is a file defined as follows :
In = 01h if n=16+8*k, k being any integer
In = 00h elsewhere
(in other words, all the plaintext bytes are null, except I16, I24,
I32, ... that are equal to 01h).
All the "00" bytes are used to unbalance the algorithm in the previously
described state. The file also contains a repetitive pattern of 8 bytes
containing only one set bit : this bit is used to get next sequence of
'1' bits of the key, as you will see later. For this example, I took a
8016 bytes long plaintext ; however, this method should work for any
plaintext that is at least 30 times bigger than the key.
Whatever key you chose, some patterns appear in the output file,
i.e. after enough rounds we can identify a periodic byte sequence.
Jeremy tried to prevent patterns from appearing in the output file
by modifying the key at each used byte (at step 13). But since the
original key is restored after all its bytes have been used, an input
file with repetitive patterns and fixed n values will also cause
patterns to appear in the output file.
Since we know that the key index y is incremented only if the output
bit is '1' (at step 10), the conclusion is : the size of the key is
equal to the number of '1' bits in a period of the output file.
With our example, the smallest pattern is 66 bytes long, starting at
offset 21, and it contains 200 bits with the value '1', so we know that
the key is 200 bits long (25 bytes).
PART THREE : THE FIRST BYTE
Let's focus now on the first bytes of the ciphertext ("00" bytes have
been skipped, the bits that have an underscore on them were already '1'
in the plaintext) :
byte nb : 0 16 24
value : 07h 05h _ 3Fh _
0000 0111b 0000 0101b 0011 1111b
Before we try to recover some bits of the key, we must be aware of an
undocumented "feature" of the algorithm : the initial n value (seed) is
calculated from the bytes of the key, but the pointer in the keyfile
is not reset after this job. Therefore, the first byte of the plaintext
is (most of the time) not encrypted by the first byte of the key. So our
first job is to find which byte of the key was used to compute these 3
ciphertext bytes.
We know that the seed calculation is done by looping through the key file
for the target file length, and at this moment we know both file sizes,
so we can find this information. In our testcase, I encrypted a 8016 bytes
plaintext and I found out that the key is 25 bytes long so the first key
byte used for encryption will be : (16016 % 25) = 16. So the first plaintext
byte (I0) is encrypted by K16 (the 17th byte of the key file, since we
start from K0).
Now let's come back to equation (C) : n(k+1) = F(Ix, n(k))
That means that if I know the input file (this is the case here) and one
value of n, then I also know all future values of n. Hey, but we do know
some value of n, since after some '00' patterns, n is equal to one of the
two values we saw before. So when the first '1' bit of the plaintext is
processed (that is I16(0)), n is equal to 0.8923637046 or 0.8377088795.
Let's assume for now that the value of n when the first bit of I16 is
processed is the second one.
The first three bits of K16 are unknown, since they are used to encrypt
the first byte of the key, and for this one we don't know anything about n.
The fourth key bit is necessarily a '1', since it's the requested value to
unbalance the chaotic function in the state where it continuously gives out
'0' ; so K16(3) = 1.
Now we can recover all other bits of K16, since we can calculate all
future values of n ; that's what we're doing now :
Let's focus on byte 16, we have :
For bit 0 :
4. O16(0) = I16(0) ^ K16(3) = 1 ^ 1 = 0
6. n = 0.8377 => O16(0) = ~O16(0) = 1
8. n = 0.8377 + 0.666666666
9. n = Norm(f(f(n))) = 0.3427
11. y = y +1
For bit 1 :
4. O16(1) = K16(4) = 0 ===> K16(4) = 0
7. n = n + 0.33333333
9. n = Norm(f(f(n))) = 0.6088
(note that this ciphertext bit is always a pure key bit)
For bit 2 :
4. O16(2) = K16(4) = 0
6. n = 0.6088 => O16(2) = ~O16(2) = 1
7. n = n + 0.333333
9. n = Norm(f(f(n))) = 0.5590
11. y = y + 1
No other output bit is at '1', so we are sure that the next key bit is
'1' : so K16(5) = 1. Now we can focus on byte 24 : for bit 0 and bit 1,
it's the same story as for byte 16, and bit 1 is also a pure key bit :
K16(6) = O24(1) = 1
For bit 2 :
4. O24(2) = K16(7)
6. n = 0.6088 => O24(2) = ~O24(2) = 1 ===> K16(7) = 0
7. n = n + 0.333333
9. n = Norm(f(f(n))) = 0.5590
Let's summarize what we found : the first key byte used to encrypt the
plaintext is K16, and its binary value is 0110_1??? ; therefore, there
are only 8 candidates for this 17th key byte (from 68h to 6Fh).
PART FOUR : ALL FOR ONE
I will now show you how to recover the complete key if you know one of
its bytes. As example, I will recover K17 from K16, any other byte of
the key can be found from the previous one with the same method.
Let's suppose we test 6Ch as candidate for K16 (which is the correct
value) ; so we know the plaintext, the corresponding ciphertext, the first
key byte and some values of n. The first thing to remember is that each key
bit is modified after it has been used at step 5 (Ky = Ox). Therefore,
when we're about to use K17, K16 has been modified into another value :
this new value is the first information to find.
Let's see how this happens :
4. Ox = Ix ^ Ky ; here we know Ix and Ky
5. Ky = Ox ; here the key bit can be modified
6. If n > 0.5 then Ox = ~Ox ; we also know Ox after this step
We know Ix and Ky at step 4, so we can calculate Ox at step 4, and therefore
Ky at step 5. We also know Ox after step 6, it is the real output value, so we
can deduce if n was greater than 0.5 at step 6.
Here's a summary of what we have : in the next table, the two first columns
and the last one are known, the other ones can be deduced from the 3 previous
equations :
plaintext K16 temp. output new key value n real output
(0h, 1h, 1h) (6Ch) (step 4) (step 5) (step 6) (7h, 5h, 3Fh)
---------------------------------------------------------------------------
I0(0)=0 K16(0)=0 => O0(0)=0 => K16(0)=0 n>0.5 => O0(0)=1
I0(1)=0 K16(1)=0 => O0(1)=0 => K16(1)=0 n>0.5 => O0(1)=1
I0(2)=0 K16(2)=1 => O0(2)=1 => K16(2)=1 n<0.5 => O0(2)=1
I0(3)=0 K16(3)=1 => O0(3)=1 => K16(3)=1 n>0.5 => O0(3)=0
I16(0)=1 K16(3)=1 => O16(0)=0 => K16(3)=0 n>0.5 => O16(0)=1
I16(1)=0 K16(4)=0 => O16(1)=0 => K16(4)=0 n<0.5 => O16(1)=0
I16(2)=0 K16(4)=0 => O16(2)=0 => K16(4)=0 n>0.5 => O16(2)=1
I16(3)=0 K16(5)=1 => O16(3)=1 => K16(5)=1 n>0.5 => O16(3)=0
I24(0)=1 K16(5)=1 => O24(0)=0 => K16(5)=0 n>0.5 => O24(0)=1
I24(1)=0 K16(6)=1 => O24(1)=1 => K16(6)=1 n<0.5 => O24(1)=1
I24(2)=0 K16(7)=0 => O24(2)=0 => K16(7)=0 n>0.5 => O24(2)=1
Before we used K16 for our encryption, its value was 6Ch. From what we obtain
in the fourth column, after we used K16, its new value is 44h. Therefore,
before being used, K17 will be XORed with 44h at step 13. So we just have
to retrieve K17 by the same method we used in part III, and at the end XOR
its value with 44h to obtain the real K17 value. Let's do it :
byte nb : 24 32 40
value : 3Fh _ 05h _ 0Dh _
0011 1111b 0000 0101b 0000 1101b
So for byte 24, we have :
bit 3:
4. O24(3) = K17(0)
6. n = 0.5590 => O24(3) = ~O24(3) = 1 ===> K17(0) = 0
7. n = n + 0.333333
9. n = Norm(f(f(n))) = 0.8043
11. y = y + 1
bit 4:
4. O24(4) = K17(1)
6. n = 0.8043 => O24(4) = ~O24(4) = 1 ===> K17(1) = 0
7. n = n + 0.333333
9. n = Norm(f(f(n))) = 0.8718
11. y = y + 1
bit 5:
4. O24(5) = K17(2)
6. n = 0.8718 => O24(5) = ~O24(5) = 1 ===> K17(2) = 0
7. n = n + 0.333333
9. n = Norm(f(f(n))) = 0.8684
11. y = y + 1
... and so on until we find K17(7). At last, we obtain K17 = 28h, so that
the real K17 value is 28h ^ 44h = 6Ch. Now we could find K18 from K17 with
the same method, and then, one by one, all other bytes of the key.
Since we had only 8 candidates for K16 and 2 candidates for the initial
n value, we have only 16 possible keys. The correct key among these 16
candidates will appear by itself, because wrong keys will produce
inconsistencies as we will decrypt more and more known plaintext. So this
attack requires at worst 2*2^8= 512 attempts to retrieve the complete key,
which takes a few minutes on a PC.
PART FIVE : A REAL LIFE ATTACK
You may think that the previous attack is just theoretical, since in real
life I will never be able to choose what will be encrypted. Well, you're
quite right, at least for a correct software implementation. But the same
weaknesses can also be exploited for a ciphertext-only attack, as I will
demonstrate now.
Before we start, let's think about what we used for the chosen-plaintext
attack : the known plaintext-ciphertext pair, the key length and one known
value of n. But the key length is not that necessary, and we could have
just assumed that the key was the same size as the plaintext : then, some
repetitive patterns would have appeared in this 'unfolded' key, and a period
of these patterns would be the real key. Also, the plaintext could have been
anything, as long as we knew one value of n. So what we actually need for
this attack is a plaintext-ciphertext pair and a known n value.
Now let's say we found a encrypted .BST file somewhere and we want to decrypt
it. First, we can look for '00' patterns, because we know that these bytes
are also in the plaintext. Some file formats contains lots of '00' bytes at
defined places, and are therefore easy to identify (.EXE, .TAR, .BMP, etc).
From there, an attack can be prepared.
As an example, let's take a encrypted PE executable file. I have easily
identified this format by looking at the '00' patterns in the file, in
the header and also some big spaces that must be used as internal buffers.
Here's the encrypted header of the file :
0000: 07 72 5d 00 97 93 01 00 34 00 00 00 e7 67 0d 00 .r].....4....g..
0010: d8 1a 00 00 00 00 00 00 40 05 00 00 00 00 00 00 ........@.......
0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0030: 00 00 00 00 00 00 00 00 00 00 00 00 80 0e 00 00 ................
0040: 1a 8d 93 0b 00 a4 60 fb 6f d8 0f 04 d9 a3 86 cf ......`.o.......
0050: 1f 60 67 50 45 ce 25 f0 2e 47 aa ed 15 03 2c 87 .`gPE.%..G....,.
0060: 4e e0 23 df a7 50 31 f9 61 d4 6a a1 ea f6 11 6b N.#..P1.a.j....k
0070: 66 cd 74 24 6c 42 69 02 d4 00 00 00 00 00 00 00 f.t$lBi.........
0080: 70 43 07 00 3c 6d 3a 00 57 2a b2 ae 3a 93 07 00 pC...m:.W*..:...
If we look at these bytes from a 'PE header' point of view, we can easily
guess what these bytes actually are :
- bytes 00h-1Fh : the MZ header.
- bytes 20h-3Fh : a sequence of '00' bytes, between the header itself and
the MZ entry point.
- bytes 40h-78h : the MZ stub (the small piece of code that displays the
string and quit) and the string itself ("This program needs Win32 blabla").
- bytes 80h- : the real PE header
Now if we look at these bytes from a 'BEST cryptanalysis' point of view,
we have a sequence of '00' long enough to give us a known value for n, so
if we knew the bytes starting at 40h, then we could start a known-plaintext
attack. Hey, but all MZ code and data bytes (from 40h to 7Fh) are just
prepended to the assembled code by the linker during compilation. That
means that if we know which compiler has been used, then we have 64 bytes
of known plaintext, and therefore we can recover 64 bytes of the key.
Actually, we also have only 3 possibilities for the 6 first bytes of the
PE header, so that we have 70 known bytes. So in a few tens attempts, we
can retrieve 70 bytes of the key.
From here we can continue in two directions, finding more bytes of the
plaintext and finding more bytes of the key. If the key is shorter than 70
bytes, a pattern will appear in the 'unfolded' key we found ; in this case,
our job is done. If the key is longer, we first could try to find its actual
length. To test a candidate for the key length, we calculate which pieces
of the ciphertext will be decrypted by these 70 bytes if the key length we
try is the correct one. Then, we calculate the according n values and decrypt
these pieces of ciphertext ; we obtain one or more pieces of 70 bytes long
plaintext : if all these bytes does not contain any recognizable data resources
(strings, lookup tables, ...) or some code, then we can assume that the key
length we tested was not correct. Unless the program is packed or crypted,
this method can give us some possible key length and some possible pieces
of plaintext.
The same kind of investigations can be performed on various file formats,
and this way some informations can be recovered just from the ciphertexts.
Let's summarize our results : with a single text chosen-plaintext attack,
we can recover the complete key (as a comparaison, the same attack against
DES requires 2^47 plaintexts). In the worst case, this attack requires 512
attempts and can be completed in a few minutes on a PC. Moreover, the same
weaknesses can be used for ciphertext-only attacks, not only to identify
the format of the original file, but also to retrieve pieces of the key.
I conclude that the BEST encryption algorithm is not secure and should not
be used.
The whole strength of the algorithm was based on the assumption that a
chaotic function cannot be predicted ; when it proved to be false, the
full algorithm collapsed. Yet, the risk of chaotic functions and the
difference between chaos and randomness had already been very clearly
explained by Scut in an old essay (fravia.org/crymaco.htm). Nowadays,
many cryptographers are suspicious of chaos, and prefer not to use it
when designing a new cipher. Maybe Chaos is just too difficult to
control to be a reliable tool ?
+Spath. (spath at iname dot com)
--
Interested in cryptanalysis ? read Casimir's essays on the great homepage
of Joe Peschel, at http://members.aol.com/jpeschel/index.htm
I wont even bother explaining you
that you should BUY this target program if you intend to use it for a
longer period than the allowed one. Should you want to STEAL this
software instead, you don't need to crack its protection scheme at all:
you'll find it on most Warez sites, complete and already regged,
farewell, don't come back.
You are deep inside fravia's page of reverse engineering,
choose your way out:
homepage
links
search_forms
+ORC
how to protect
academy database
reality cracking
how to search
javascript wars
tools
anonymity academy
cocktails
antismut CGI-scripts
mail_fravia+
Is reverse engineering legal?