+HCU papers
courtesy of fravia's page of reverse engineering
"code optimised in this way will just about blow the door and hubcaps of the slop
that passes for code in most places these days"
SLH
> THE BRIDGE
>
> In Pursuit Of Lost Knowledge
>
> During the French occupation of Egypt, workmen in 1797 found an ancient
> stone with three different styles of inscription on it. It became known as
> the Rosetta Stone. The top style was written in Egyptian hieroglyphics, a
> set of symbols that no one had been able to read since antiquity. The second
> script was known as Demotic script which was partially understood and the
> third style was written in ancient Greek, a language that was very well
> understood.
>
> A young French scholar by the name of Jean Francois Champollion recognised
> the name of Ptolemy, an Egyptian Pharoah of Greek origin in the part of the
> inscription written in Greek. With research, comparison with other sources
> and persistence, he eventually pulled off one of the greatest "cracks" of
> all time, the recovery of the Egyptian hieroglyphics that had been entirely
> lost for over 1500 years.
>
> The capacity to "crack" the language that was fast becoming the "Egyptian
> hieroglyphics" of application programming is just within range of
> application programming through an anachronism that still survives in some
> of the high level languages, the capacity to write inline assembler.
>
> Of the major programming environments that still survive in one form or
> another, none are without hope when it comes to being used as a bridge to
> regain lost knowledge.
>
> Some are just more difficult than others. There are still dialects of C that
> will write inline assembler. Borland in its rise from the ashes of the last
> rape and pillage by Microsoft still support inline assembler in their latest
> version of Delphi 3.
>
> Even Visual Basic 5 is not without hope in this area. Most who have had to
> suffer this abomination have learnt that if you want performance in VB, you
> went looking in the aftermarket.
>
> A small Californian Company called PowerBasic has a product marketed under
> the name of PBDLL50 that allows the basic programmer to write dynamic link
> libraries using among other things, a quality 32 bit inline assembler.
>
> The purpose of seeking a bridge is to use existing skills in the language
> of choice to recover the things that once were common knowledge in the
> programming community, the use of assembly language where performance was
> needed.
>
> Interestingly enough, the assembler notation that is commonly used is two
> layers up from where the action is inside the processor. At the lowest
> level, binary is switching on and off banks of zeros and ones. There is a
> notation for using it but it is truly, only for the hardy. The first usable
> form of opcodes is in the more familiar hex notation that is seen in a hex
> editor.
>
> This is still a relatively difficult way to write binary for while the
> character data occurs in left to right order, because of a peculiarity in
> the design of Intel processors, numerical data is stored in reverse order.
>
> To copy a 32 bit hex value (56 A7 00 FE) into the eax register, you will
> find the opcode, A1 (mov eax) followed by (FE 00 A7 56).
>
> A1 FE 00 A7 56
>
> While this is a high precision area to edit binary, it could only be classed
> as somewhat less than intuitive when it comes to software production.
>
> To aid in the production of software, a higher level language was developed
> to make the production of binary more managable and this is the place for
> assembler language.
>
> In the assembler module, you will have a data type declaration of the type
> DWORD or the traditional notation DD which ensures the data is the correct
> size. If you call it Var1, when you copy that piece of data into the eax
> register, you simply use [ mov eax, Var1 ] instead of [ A1 FE 00 A7 56 ].
>
> The other major simplification is the use of the commands that are referred
> to as mnemonics. The actual opcodes are different depending on the size and
> direction in which data is being copied.
>
> mov eax, Var1 = A1
> mov Var1, eax = A3
>
> The mnemonics give you a far more intuitive way of writing binary without
> any loss in precision. This is in the genuine sense of the term, a high
> level language but The difference is that when you write code in assembler,
> there is an exact correlation between what you write and what you get in
> binary.
>
> The Inline Assembler
> ~~~~~~~~~~~~~~~~~~~~
> Inline ASM notation varies from one language to another yet they are similar
> enough to the known assemblers to function as a vehicle to bridge the gap
> between the corporate "haves" and the aplication programming "have nots".
>
> In C, you use an asm block,
>
> __asm
> {
> mov eax, 0
> etc ...
> }
>
> In Delphi you use the ASM statement,
>
> "asm AsmStatement [ Separator AsmStatement ] end"
>
> which allows the more orthodox looking,
>
> asm
> mov eax, 0
> Statement ...
> Statement ...
> end
>
> In PowerBasic you use the notation,
>
> ! mov eax, 0
> ! etc ...
>
> The place to start trying out inline assembler is in your speed dependent
> code. This can be a very wide range of application types ranging from CAD
> style number crunching to spreadsheet style calculations to graphic image
> processing and many more.
>
> To help you there is a very useful Windows API called GetTickCount. The
> piece of pseudo code following shows how it is useful as a simple but
> powerful performance barometer. Get your function up and going and then
> clock it as follows.
>
> Local tc1 as LONG
> Local tc2 as LONG
> Local mil as LONG
>
> tc1 = GetTickCount()
> Call_Your_Speed_Dependent_Function
> tc2 = GetTickCount()
>
> mil = tc2 - tc1
>
> Convert mil to text and put it in a MessageBox and you will get the bad news
> about what is happening. Run it a number of times to get the fastest time
> because of ram buffering and then start on the speed dependent function. If
> you are writing this process for VB, make sure you put the GetTickCount()
> API call in the DLL, not in VB.
>
> Structured loops are often one of the culprits in wasting machine cycles.
> Although it may not be politically "correct" programming for some, the most
> efficient loop in terms of processor cycles is,
>
> Label:
> instruction ...
> instruction ...
> instruction ...
> etc ...
> jmp Label
>
> The next trick is to look at the exit condition for the loop and see if you
> can optimise it. The following is a mixed API, ASM win95 loop.
>
> FastLoop: |
> Function_Calls <128 bytes |
> |
> cmp x, 0 |
> jnz FastLoop ----------------->
>
> If the length of the code within the loop exceeds 128 bytes, the code will
> need to be modified slightly because a short jump has a limit of 128 bytes
> range.
>
> FastLoop: |
> Function_Calls > 128 bytes |
> |
> cmp x, 0 |
> jz OutOf |
> jmp FastLoop ----------------->
>
> OutOf:
>
> The example loops are limited by the need to use a function call to supply
> part of the compare for the exit condition. If you need to run a loop a
> predetermined number of times, you can get even better gains.
>
> Local Count as LONG
>
> Count = 0
>
> LoopStart:
>
> Do_The_Dirty_Work_Here
>
> cmp Count, 100000
> je LoopEnd
> inc Count
> jmp LoopStart
>
> LoopEnd:
>
> In many areas, loops of this magnitude and much larger are common and it is
> in algorithms that make extended use of nested loops and multiple
> comparisons that will see the greatest gains from what is still very simple
> coding.
>
> If you type it up nicely and indent it properly, it almost looks like
> politically "correct" structured code but the difference is that code that
> optimised in this way will just about blow the door and hubcaps of the slop
> that passes for code in most places these days.
>
> Each time you perform one of these "grubby" little tricks, the performance
> barometer will give you the best news you have had for a long time, your
> algorithm just keeps getting faster.
>
> Most accept that computers crunch numbers very well with their 80 bit wide
> floating point maths data path but few have seen the real speed of integer
> maths written directly in assembler. This stuff is fast enough to make the
> average high level programmer wet themselves.
>
> x = YourVar as LONG
>
> mov ecx, 1 ; put count in ecx
> rol x, ecx ; rotate YourVar left by count in ecx
>
> mov ecx, 1 ; put count in ecx
> ror x, ecx ; rotate YourVar right by count in ecx
>
> This is the really quick way to do integer multiplication and divide when
> powers are involved. Look up the integer maths instructions in the Intel
> or similar literature and you will find some interesting suprises.
>
> These two bit rotation instruction execute on a 486 or later at 4 clock
> cycles of the processor so if your processor is running at 300 meg, it
> takes one 75 millionth of the unit time value to execute.
>
> Start playing with the floating point maths instructions for late model
> processors and your calculations will develop the "eyeblink" problem, blink
> and you will miss them. Calculations that are over as the enter key is
> reaching the bottom of its travel are starting to happen fast enough.
>
> In a single instance one may wonder, why bother, but when you are performing
> calculations for a grid control or a spreadsheet which may have thousands of
> entries, the speed actually makes a difference. The limitations are usually
> related to how long it takes to read the values from their locations and
> redisplaying them.
>
> This form of incremental optimisation is an excellent learning vehicle as
> it allows the high level language user to continue to work in a familiar
> environment without having to start "cold turkey" on an unfamiliar assembler
> without knowing how to write the startup code.
>
> Neither C/C++ or Pascal are slouches in terms of binary generation, its just
> that they were seduced by the VB style of quick 'n easy front end generation
> and it is here in the short term that the balloon part of their code
> generation is not easy to fix.
>
> OOP, not oops
> ~~~~~~~~~~~~~
> Object Oriented Programming at its basics is the encapsulation of well
> written high level packages that can be adjusted with ease and extended if
> there is a need for additional capacity.
>
> A good example is the 16 bit edit control that still graces the current
> bleeding edge 32 bit OS. For a line of code you put up a single line editor
> that has insert or overstrike capacity, input length control, and font
> adjustment by sending it a message. Twiddle the style bit and it becomes a
> multiline editor with either word wrap or normal text display.
>
> Subclass it and trap the keystroke messages and you can produce filtered
> input as the data is being typed into it. This "object" has been around
> for so long that it is old hat but it models what is wrong with the new
> generation of "objects".
>
> Most know the difference between code that just does the job and code that
> is written for reusability. When you find "objects" that just manage to do
> what they were aimed at while depending on complex, multi-dependent
> structures, half documented functions that perform differently to the
> documentation and bugs that take nearly amazing fixups to make them work,
> you get the idea that the turnover was more important to the vendor than
> the performance.
>
> It is about as obvious as tits on a bull that the development time for the
> current version of Win 32 was spent on the glitzy front ends, not on the
> application level "objects" that were supposed to deliver gains in control,
> simplicity and throughput.
>
> The reason why there is rumbling in the ranks of application programming is
> because of their vulnerability to and dependence on the PUKE that is being
> strapped together and sold at inflated prices under the control of a single
> monopoly.
>
> The programmer of whatever language disposition that starts to look at ways
> around the problems has no option but to go lower level. Direct API
> programming still delivers better performance than the bug ridden junk that
> is being passed off as software these days but it takes more work,
> particularly if your language is not C as the windows APIs are straight C
> in their design and calling parameters.
>
> With the standard of documentation supplied by the OS vendor, data
> encryption and function "cracking" is part of the process of writing at the
> API level but most have suffered long and hard enough to know how to work
> this way.
>
> The people who will see the greatest gains in mixed API / ASM programming
> are those unfortunate enough to have to shovel their way through VB 5. Just
> getting your code into a compiler gives enormous gains in speed. Start
> writing inline ASM and your DLL modules will just about burst VB at the
> seams. This is something like an using an Exocet missile to do a hit on an
> oil tanker.
>
> The knowledge that you get from making the effort is yours forever and it
> is far easier to maintain than floundering from one corporate vendors
> spittoon to another. The hardware keeps getting faster and smarter but the
> new instructions that come out from time to time are always written in the
> same language.
>
> After the First World War, the scholarly community had a chance to absorb
> the ramification of Russell and Whitehead's "Principia Mathematica" and one
> of Russell's own pupils, Ludwig Wittgenstein triggered off a movement that
> thought that with the new and powerful logic from Principia that they
> could construct an "Ideal Language" that would be free of the ambiguities
> of the past.
>
> As it wound up, invective and all, rules started to be drawn up about what
> was a meaningful statement and what was not and this developed into what
> was called the linguistic movement that flourished through the thirties.
>
> The work done by Kurt Godel in 1931 that proved that axiomatic systems
> had boundaries that prevented them from addressing statements beyond their
> boundaries that were true, gave weight to the position that logic could
> only ever be the servant, not the master.
>
> The last serious attempt to sell the view of the linguistic movement was
> published by A.J.Ayer in 1948 in a work called "Language, Truth and Logic"
> where the criterion of meaning was determined by whether a statement was
> either empirically verifiable or logically true.
>
> It was trashed at about the time of its publishing by an argument so rabid
> that its author gave up the position. The argument was that the "Criterion
> of meaning" was neither empirically verifiable nor logically true so it was
> meaningless.
>
> This level of abstraction is highly relevant to the design of modern high
> level computer languages. Computer languages are by structure, complex
> axiomatic systems. Rules of grammar, syntax conventions, keywords and the
> like. Flexibility and power gave as much structure as was necessary to make
> code generation easier and more convenient yet left room for additions and
> improvements.
>
> The rise of closed system Integrated Development Environments signalled a
> return to the defective logic of trying to construct another "ideal"
> language which carried all of the problems associated with blunders in
> fundamental logic.
>
> The more rigid and proprietry the IDE's became, the less they were able to
> deliver access to the true statements that were outside of their axiomatic
> structure. If this sounds familiar, it is because the logic of the greatest
> brains of the 20th century is far more powerful than the hype of greedy
> corporate vendors lining their pockets at the expense of a nearly captive
> market.
>
> The abuse of power is no stranger to those in the pursuit of knowledge.
> Some of those who were members of the "Vienna circle" that actively debated
> the notions of an "ideal" language became the victims of men in brown shirts
> who took the books from the universities and burnt them and sent those who
> were not deemed to be racially pure to the gas chambers.
>
> The difference is that it is very difficult on The Internet for any power to
> effectively crush the transfer of knowledge. The movement start only a few
> years ago when the enigma identified as the +ORC started to publish a now
> famous set of essays on reverse engineering.
>
> While some play at discovering the identity of the +ORC, a mind as incisive
> as the author of the essays that have been published will never be found.
> It finally does not matter whether the +ORC is an elderly university
> professor, an angry young man, the best bartender on the block or an insider
> in the corporate world of computing, the important thing is that the
> identity is protected to preserve the movement that has started from the
> essays.
>
> There is some value in addressing the notion of reverse engineering
> software where the source of the code is not available. Recovering the
> code from the days when it was written on punchcards in Cobol or Fortran
> so that it can be fixed and maintained is a necessary if unpleasant task
> for those who have to do it.
>
> Reverse engineering PC based games and applications, functions as the open
> university of the Internet where researchers have put in long hours of hard
> work to bring the knowledge of binary structure back into the public arena.
>
> The area of reverse engineering that should be left alone for ethical
> reasons is the type that was the subject of a protracted and bitter legal
> action between the Microsoft Corporation and Stac Electronics.
>
> Record will show that the Microsoft Corporation was brought to its knees
> for having been caught reverse engineering the Stacker software, stealing
> its design and incorporating it into their own product which it attempted to
> sell for profit.
>
> Theft of intellectual property of this type is by any measure, unacceptable
> conduct on the part of the Microsoft Corporation. It comes as no secret
> that the Software Piracy Association did not bring this action against the
> Microsoft Corporation. It seems that a body of this type condones the
> actions of one of its major clients while mouthing the rhetoric of fair
> play and trading.
>
> If this "front" organisation was serious about protecting the "rights" of
> software vendors, it would actively target the proliferation of smut vendors
> that use the promise of illegal free copies of software linked to the warez
> sites to lure the unwary into their clutches.
>
> The difference is that while trying to sell access to low class porn images
> that debase the sexuality of attractive young women is acceptable to an
> organisation of this type, the dissemination of hard won knowledge which
> cannot be bought is not.
>
> Many legal jurisdictions have methods of processing legal structures that
> are put in place for illegal reasons. Under some readings of restrictive
> trade practice legislation that is in place in various parts of the world,
> a body of the type of the Software Piracy Association is acting in breach
> of statute law by acting to force up or maintain the retail price of
> software on the open market.
>
> The movement that is gathering pace across the Internet cannot in any
> reasonable sense be labelled with the same illegal tag that the Microsoft
> Corporation has to wear as a consequence of stealing the design of the
> Stacker software.
>
> The people who have been dispossessed by the market activities of the
> corporate vendors are starting to be heard in the most powerful of ways,
> they are publishing their knowledge in the difficult areas of assembler at
> a level of quality that has never been seen on an open market before.
>
> The range and quality is truly stunning, operating system analysis and
> design with ASM source code, the inner workings of secret VXD files with
> specialised tools to help, debugging and dis-assembly tools that take much
> of the pain out of reconstructing lost source code and there is much more
> coming.
>
> Finally, the people who make computers into a viable and powerful tool for
> an almost unlimited array of subjects are the most maligned of all, the non
> technical "user". When a couple with children go without their holiday or
> the normal little luxuries to scrape together enough money to buy a humble
> computer for their kids to learn with, it is unreasonable that after having
> spent their hard earned money that it crashes and does not work properly.
>
> The great power in the field of software design and execution is enabling
> those ordinary people to do things that they were unable to do before.
> This is coming with the development of a new generation of software that is
> not just a vehicle for the short sighted and greedy corporate vendors but a
> clear and precise set of tools that will deliver the promise that the
> greedy have already broken.
>
> When you put your hand to the first fledgling instructions of inline
> assembler, you are joining the movement of history by taking an initiative
> that the greedy an short sighted would deny you.
>
> THE BRIDGE
+HCU papers
homepage
links
anonymity
+ORC
students' essays
academy database
antismut
tools
cocktails
javascript wars
search_forms
mail_fravia
Is reverse engineering illegal?