This essay has no use to all except those willing to decompile InstallShield scripts.
I will explain here where you can find InstallShield script compiler code generator tables.You will find from the disassembly how modern compiler encodes code generation procedures etc.
|
Decompiling InstallShield scripts is not cracking. It's pure reverse-engineering. It may be more - a lesson how good compiler is built, how expressions are translated into machine code, how they are executed, how a modern compiler implements token search algorithms and more. It's impossible to cover all these topics in one short essay, but I think it may be a foundation for other crackers and +crackers to delve deeper into this subject (assuming anyone is interested in pure reversing). Goal of this essay: To give the reader some basic information about InstallShield compiler.
Future Goal: Someone writes an InstallShield scripts decompiler :).
>>>Author's note five days later: There IS now InstallShield decompiler!Before reading this: Read this essay by NaTzGUL. I'm assuming here that the reader knows what InstallShield script is and where it can be found (hint: setup.ins ;).
|
|
|
I'm assuming here that the reader has all our tools ready and will follow all steps described below. I don't attach any listings. If you want to follow, just load setup.exe and compiler.dll into IDA (one after another), wait until autoanalysis is finished and save resulting IDB files. When reading above mentioned NaTzGUL's essay (excellent, btw.) I tried to delve deeper into compiled script. First I disassembled setup.exe (this is the script interpreter if you forgot) and located the command scanner: At 00420E84 you will find opcode fetch, parameter determination and, finally, a jump to the command service routine. Table containing addresses of procs (opcodes 000..1C6) starts at 00495FC8. For each command there is a record: byte parameters type dword procedure_address When you have time, name these procedures like cmd_xxx where xxx is opcode. eg. at 0049609B you will find a record containing (2, 0042AC02), which corresponds to the command 02A - MessageBox. 02C is goto, 02F is strlen, 033 is Exec etc... This was found quite easy, but then comes a more tricker part: Where can we find real names of the functions? After short thinking I decided to reverse the InstallShield compiler. Of course the compiler must be located anywhere first. First solution: Get it from www.installshield.com (Lite version). Second solution: Get in from the Web (you know how to search). Third solution: Get Visual C++ 5 or other CD containing InstallShield Lite as an added bonus. Fourth solution: Get it here (DLL only for reversing). Our target is now compiler.dll, main compiler module called from the IDE and probably from the command-line script compiler which I don't have. The essential part of the compiler is lexical analyzer which isn't so interesting for us. More important is the code generator and token analyzer. Remember that this compiler does not generate pure machine code. It generates scripts which are interpreted during actual setup process by setup.exe. The code generator contains of [deep breath here] several hundred pointers to linked lists of records containing pointers to code generation tables [deep breath end] (sounds nice, ehm). I'll explain it briefly. Short definition: Token: something that can be a keyword, function name, number, variable name etc, eg. tokens are: goto, MessageBox, IS_OS2 etc. If you are purist, replace this definition by yours. Assume the compiler has completed the token MessageBox. It searches a service for it using nice pointer table contained in 10033408..10033807. Looks like a tree... These 256 dwords are almost all pointing to linked lists of structures. Let's see eg. where points dword at 10033650. Click on reference and you land in 10031938. Make 4 dwords from the following data and you quickly discover the token table structure: address 10031938: actual value dword *next_list_element_pointer 10031948 dword *token_string "ConfigAdd" dword command_flags 00000201 dowrd *token_service_data 1002E898 command_flags are 0201 in all functions and built-in constants, other values in this field are meaining that token is a keyword (eg. 'case' has 0A01) token_string is simply this what are we searching for: ASCIIZ name of function or constant. token_service_data are the data for the code generator. These data are several (mostly 3 for constant or 4 for functions) words, eg. - for functions: word 2 (this means: token is a function) word function_opcode word ? (maybe parameter count?) word ? - for compiler predefined constants: word 0 (this means: token is a predefined constant) dword constant_value. Let's follow the linked list in above example. At 10031948 we have: dword pointing to next list element dword pointing to "MessageBox" dword 00000201 (function/constant) dword pointing to 1002E8A0 And at 1002E8A0 we have: word 2 (function) word 2A (opcode for this function in script) word 2 ??? word 2 ??? Of course IDA 'Xref clicking' is much faster than reading this step-by-step description. So, we have found an opcode for MessageBox which is (surprise?) equal to this one found by reversing setup.exe (see above). Once again, here is a quick way to find an opcode for built-in function or predefined constant value: 1. Locate ASCIIZ name of searched token (Alt-B in IDA). Remember to find this keyword _exactly_ as it should appear. 2. Click on Xref - you will land in linked list 3. Check if dword after is 00000201, if not you are searching for a keyword... 4. Click on Xref of dword below - you will land in token_service_data. 5. Second word is the opcode. Let's check it again for LaunchApp (should be simple exec, isn't it?) Alt-B, "LaunchApp" (check case-sensitive, otherwise slow search) -> 1002C2FC. Click on Xref: 1003198C Below we have dword 00000201 (or press 'D' 3 times to make dword) And then kinda offset - click on it -> 1002E8BE. First word is 2 - it's a function Second word is 33 - it's the opcode. BTW, when you return to the disassembly of SETUP.EXE and find command code 033 service procedure you will find quickly that we're right... Author's note five days later:
{
You may experiment with IDA script language to correctly display all these records.
There is no need to experiment further. The decompiler is already written.
} So... A little bonus for all going so far with this essay! If you want to see the token table compile the file below.
and here short doc to the above indis.exe - InstallShield COMPILER.DLL token table dump - by zeezee Usage: Place into directory where COMPILER.DLL exists Works only with version which is exactly 260096 bytes long and dated 22.01.97. When using other version you must find and change table start address! Command line: indis >indis.dat For each function/constant/variable a record is generated eg: For predefined values: F=0201, O=000000CA, T=0, N=FREEENVSPACE where: F - flags O - constant value (long) T = 0 for constants, 1 for variables (?) N - name For built-in functions: F=0201, O=0000, P=5, T=2, N=StructGetAddressEx F - flags (other than 201 only for keywords) O - opcode P - param count (?) T = 2 for built-in functions N - name Enjoy! zeezee
|
Greets to NaTzGUL for his brilliant essay.
>>>five days later: ...and for wisdec decompiler!
zeezee (not +zeezee yet, but I hope I will earn this someday ;)
|