An installshield Decompiler |
Advanced reversing | |
by fravia+ |
||
fra_00xx 981030 adq 0010 AD 0T | Our tools |
|
I used wisdec to explore the compiled files, changing scripts, recompiling them, and observing the differences reported by wisdec. It took a couple of evenings to divine the format of an installshield 2 file in this manner.
Here, I shall describe a couple of things about installshield files which are essential to understand the following:
The first thing written was a parser for the header. This code is fairly simple: it reads values in from the file, processes them, and stores them in appropriate data structures.
However, the main script decoder is rather more complex. It involves
three
passes through the script code:
The first pass actually reads the raw opcodes from the file, and
transforms
them into an internal structure describing the code. This is
implemented as a massive table-driven algorithm. The table is keyed by
opcode.
Each entry contains a function pointer to a specific parser function,
along
with some extra information, such as the parameter count. For an
"installshield
system function", a generic decoder function is available, since they
all
have the same format. The main loop of this stage reads in an opcode,
looks it
up in the table, and executes the associated function there. This
function
takes care of the specific processing for that opcode, before
returning to the
main loop. This continues until the end of the file is encountered.
The second pass works out function/prototype pairings, and fixes local variable counts. Because of the way a compiled script works, it is only possible to work out which function prototype is associated with which function body after a call has been made to that function. This stage goes through the interpreted code, looking for function calls, and associating function bodies with prototypes when it finds one. It also works out which variables in the function are locals, and which are parameters, since, again, this is not possible until it is discovered which function prototype pairs with which function. Note that it doesn't actually alter the code in the function to reflect this; it just works out which variables are which. Note that this means any function which is not called cannot be matched to it's prototype, and therefore has to be discarded.
The third, and final pass, goes through the code again, this time transforming the code in function bodies to reflect whether a local or a parameter variable is being accessed, to simplify any later processing.
Now, we have a huge memory structure, representing the compiled file. The next step will be to optimise the code sequences, and recover more of the original structure, for example FOR loops, and IF/ELSE sequences. However, this part is still under development.
Finally, the memory structure is decoded into a .RUL file and output it.
It has also had a large number of functions added to it. To find these, I examined the handy installshield documentation. (I even found some hidden features - see below)
Several functions have been removed from installshield 5, notably the CompressGet family. Installshield have completely revamped their method of installation, and have unfortunately decided to completely unsupport the previous method.
All this means that you cannot recompile an installshield 3 script with the installshield 5 compiler, and vice versa.
Due to the fact that the code automatically discards unused functions, installshield scripts tend to halve in size when recompiled. For example, even if you only use one of the SdDialog functions, the compiler includes all of them in the compiled file.
Incidentally, I discovered a hidden feature of installshield scripts: the call statement. It seems you can have subroutines based on call/return as well as functions. I saw one script which used this feature, which prompted me to investigate further. I wonder why they don't tell anyone about it, since it is still in the compiler.
Currently I am developing code the recover higher level code structures, so that installshield 3 scripts should soon be recompilable too.