New topics: Your Pet, IOU, Baby IQ, The Poisons, Birther II, Games, Future Power

Welcome to the Tech Space!

Pages about Programming

Skip to end of metadata
Go to start of metadata

Optimization techniques

Measurement
ActionScript optimizations
Flasm optimizations
Double nots
Thanks
Huge bitmaps, not optimized vectors, false frame rates, animating many movie clips at the same time, loading large XML files, dealing with tons of editable text, streaming high quality sound, or simply viewing SWF on mac --- in 95% of all cases, bad SWF performance has nothing to do with ActionScript. Flasm, although being "yet another cool tool", is no solution for above problems. Optimizing with Flasm makes sense for games, 3D engines, path finding, actually converting large amounts of data --- computing things in general. Flash MX 2004 and Flash 8 made things much better here, too --- at least for newer Flash Players.

If you're unsure where is the bottle neck: slow drawing or slow calculating, there is a simple trick: make the player or browser window very small and switch aliasing off in the Flash Player. If performance increases significantly, you probably should optimize your graphics or movie clip structure first.

Measurement

Don't try to optimize every single line of your code --- you'll just make it unreadable, probably omit some important places, and nobody will ever notice 10.000 hours of your hard work. The key to any optimization is measurement. Bottle necks are very hard to guess. I used to have plenty of tips here, well tested for Flash 5 and Flash 4, but it just can't work this way because of the current diversity of possible environments. Now we have Flash Player 8, 7, 6, 5 and (still) 4 out there, stand-alone applications, mobile devices, not to mention Windows, Mac and Linux. These all are entirely different species. It means only some general strategies still work everywhere, and you have to measure your particular application on your target environment yourself. But how?

Optimize and test for your target environment. Tests in Flash IDE are very rough estimates at best.

You should differentiate between ActionScript code in FLA and compiled code in SWF. For example, if "Omit Trace Actions" is checked during publishing, traces simply do not make it into SWF. They do not exist there. The same goes for commented code. #included files are in effect exactly the same as inlined code, since they are first included, then compiled --- nothing to test here, too. There are no classes in SWF --- just functions. Flash compiler optimizes library function calls with constant arguments too. Calls like f = Math.sin(0.25) or f = Math.max(3,5) are never saved in SWF, the calculated values go here. Neither will if parts with condition that always evaluates to false be stored. Some local variables are stored in registers by the compiler, which makes a huge difference. And so on. Before you test, take a look at disassembly.

Compiled bytecodes for Flash Player 7 are, err, context-dependant. Code inside of function2 benefits from local registers. The same code in a frame will not.

In a standard procedural programming language, most of the program time is spent in loops and functions or methods called from those loops. In Flash, frame loops and often or parallel called events should be investigated too.

Ben Schleimer's Flasm Profiler (or is it Flash Profiler?) and David Chang's ASProf are attempts to solve the main problem --- what to optimize. I don't know how good they work, because I've not used them in a real-world project yet --- they are relatively new. The profiler basically tells you execution times and number of calls for every function.

After you've found what's critical, find better algorithm first, or change your approach in general. Although you could improve the code in small, optimization should be the last resort.

Do all tests in a defined computer state --- fresh booted, no virus scans or internet connection in background, all other programs closed. Don't move or resize widows during the test, don't do anything. Don't move your mouse, and let your mouse stay over flash movie. It does make a huge difference. That's not voodoo --- OS manages your mouse, and Flash isn't the only running process.

Think about graphics. A script is never interrupted, so it's relatively easy to measure. If, however, you start to measure in the first frame, and end in tenth, you measure everything in between and simply don't know what you measure. The result largely depends on player's mood and takes generally much longer, making your ActionScript test irrelevant. Network requests, gotoAndPlay actions and many other things related to screen refresh are executed asynchronously. Get them out of measured code parts.

Beware of loop overhead. Short test times are not reliable, since the loop itself takes most of the time. Calculate time for an empty loop, function, etc. and subtract it from your results. Use big loops, so that remaining times are bigger then, say, 1000 ms. Computers aren't that exact in ms. Generally, try to isolate the code in question from anything else and measure that code only. Of course, try to get other factors out of consideration first --- network bandwidth, graphics etc. Otherwise you results can't be compared because of hidden overhead.

Mac Flash Player is slow compared to PC. Test on Macs early.

Don't execute different tests together. If you try to compare optimization 1 with optimization 2, give them the same environment. One run --- one test. If you put both tests in the same frame, you start to deal with caching.

ActionScript optimizations

Flash Player 7 and 8 are much faster with ActionScript. Because of player improvements, and because of compiler improvements in Flash MX 2004/Flash 8. The latter are only noticeable if you compile for Flash Player 6 or higher though. If that's your target audience, you mostly can do now without Flasm optimizations, because the compiler will use registers anyway. You will still be able to achieve better performance with Flasm, but your first step should be getting Flash 8 IDE for critical applications.

I used to elaborate on so called "deprecated Flash 4 actions" here, which are much faster in Flash Player 5 or 6. The worst example: myMC.gotoAndStop() was 25 times slower than tellTarget("myMC") gotoAndStop(). In Flash Player 7 they finally don't seem to make a real difference. To insist on recommending them, I would have to re-do the whole testing, including mobile devices, so to hell with them.

Action blocks are always executed from the start to the end, no event or gotoAndPlay() will interrupt execution of other code. That's the reason why any large for loop will hang the player, no screen updates are made.

Define local variables in functions with var keyword. Local variables are faster, generally a good practice, and may be replaced with registers automatically, if compiled with Flash MX 2004/Flash 8.

eval is something special compared to, say, this or any other ActionScript keyword. In fact, eval is kind of macro --- it doesn't have a bytecode, but simply writes its argument onto the stack --- at compile time. No doubt it's faster than any method call. Starting with Flash MX, you're no longer allowed to use eval on the left side of assignment. Use set instead.

Unfortunately, identifier length still matters, even in Flash 8, so choose short names for variables. This can be extended to built-in functions too. Creating the function t = Math.tan and substituting all Math.tan occurrences with t will serve 2 purposes: no additional lookup is made for object Math, then for method tan; and the name itself is shorter. It works only for Flash 5+ methods and functions; Flash 4 functions will slow down. Of course, names of local variables don't matter if they are stored in registers.

The old trick with replacing b = a*4 to b = a«2 (shift) makes no speed difference in ActionScript.

Flash tries to precalculate constant parts of your expressions. The calculation order results from operator precedence. As Robert Penner noticed, rad = Math.PI/180 will actually store calculated value in SWF, while rad = c*Math.PI/180 will not. Conclusion: explicitly set the precedence to enable precalculation (rad = c*(Math.PI/180)in this case).

for and while loops show no speed difference. It depends on how you write them. The most optimized ActionScript examples of both, looping down to 0, produce the same bytecode: for(var i = 10; i- {} and i = 10; while (i-) {} The third part of the for loop, absent in my example, is actually in the body of loop, so you can't compare it with a normal while.

Avoid multiple parallel hitTest() functions in events --- often seen in games. If the player is killed after any touch with an enemy, and you have 100 duplicated enemy clips, don't include any code in the enemy clip enterFrame event. Create the new movie clip and insert the enemy clip here. Then duplicate inside of this parent clip. Now you can check with only one hitTest() if the collision takes place. If you need to, use some custom math then to calculate what enemy was hit. Since most of the time no collision occurs, you'll make a really big improvement in fps.

I mostly do not say "3.45 times slower", because comparisons are very context dependant, exact values will vary. My "slower" just means "noticeably slower, no situation ever makes it faster".

The list is by no means complete, and will never be. Technology may render some points incorrect, again. Please make your own tests.

Flasm optimizations

After you're done in ActionScript, and the code is still slow, you can start to optimize with Flasm. Basically only two meaningful low-level features are not accessible from ActionScript and therefore subject of Flasm work: stack and registers.

Let's optimize a simple loop using stack. Our ActionScript is

for (var n=0; n<1000; n++) {
  someFunction;
}

Flash compiles this loop to the following bytecodes:

constants 'n', 'someFunction'
push 'n', 0.0
varEquals
label1:
push 'n'
getVariable
push 1000
lessThan
not
branchIfTrue label2

push 'n'
getVariable
push 1, 'someFunction'
callFunction
pop

push 'n', 'n'
getVariable
increment
setVariable
branch label1
label2: | // Store all variables in constant pool
// Push the string 'n' and starting 0 onto the stack
// Initialize loop counter: n = 0
// Start of the loop

// Get the value of 'n' again
// Push loop bound
// Evaluate boolean condition: "n < 1000?"
// Invert: now "n >= 1000?"
// If "true" is on stack, go to the end of the loop

// Loop body
// Get the value of 'n' again
// Push the number of args (1) and function name
// function call is made with n as argument
// Pop the possible function result away — it's unused

// Push 'n' two times
// Evaluate 'n' again
// n+1 on stack now
// n = n+1
// jump to the loop start — unconditional
// end of the loop — addressed with branchIfTrue above |

What we immediately see, the n variable is evaluated many times here. getVariable action is slow compared to stack operations, and the n is only used as local counter. Why not discard n, keep the counter on stack and use it over and over, thus eliminating all getVariable calls? We also don't need the constant pool declaration, since n will disappear, and someFunction name will be only used once. The number of jumps can be reduces to one, too. We know we have to call someFunction(0), so there is no need to check for the condition on the top of the loop. Look at optimized version:

push 0
loopStart:
dup

push 1, 'someFunction'
callFunction
pop

increment
dup
push 1000
lessThan
branchIfTrue loopStart
pop | // No need for double 0.0, integer 0 will do it
// Choosing meaningful name
// dup the counter — our function will eat it up

// Push the number of args (1) and function name
// function call is made with n as argument
// Pop the possible function result away — it's unused
// Now the counter is on top of the stack again
// Increment it
// Dup the counter — condition evaluation will eat it up
// Push loop bound
// Condition evaluation: counter < 1000?
// Jump to the loop start, counter is on top
// Should remove counter from stack after the loop |

We can go even further. If our function, say, fills an array with some calculated values, it makes no difference to do it from 0 to 999 or from 999 "down to" 0. We can eliminate lessThan action in this case, because branchIfTrue is kind enough to convert 0 to false, and all other numbers to true for us.

push 1000
loopStart:
decrement
dup
push 1, 'someFunction'
callFunction
pop
dup
branchIfTrue loopStart
pop

We moved decrement to the top of the loop, because otherwise branchIfTrue would immediately exit loop if the counter value is 0 and not let us execute someFunction(0).

As you see, we end with a pretty clear loop version, which will be much faster than the original Flash. How much, depends on what someFunction() does. As the next step you would go there and optimize it.

The best way to learn how to use registers is to compile the same code in Flash MX 2004 for Flash 5, Flash 6, Flash 7, and look at the disassembly. Flash 5 version will use r:0 only, Flash 6 will utilize all four global registers, and Flash 7/8 will add local function2 registers.

Now if your target is Flash 5, you'll see from Flash 6' code what can be done. For higher targets, the room for further optimization is smaller. But there are still many places where the code could be improved --- basically by eliminating useless pops, pushes and branches.

push statements may push multiple values, not just one. Try to merge single pushes into one. That's way faster. You'll have to slightly re-arrange the code to do that.

Registers are faster than variables, but still slower than stack. Why not keep all the values on stack so they go to the top just in the moment you need them? The problem is, if you're doing this with 2 or more variables, your algorithm may want to access them in a different order than they're stored. If some value is only required, say, at the start and at the end of your routine --- no problem, it happily lives somewhere at the bottom, waiting for its time coming, and lets you work with other values on top. But for often needed values it doesn't work. While we have swap action to exchange two top values on stack, we can't directly access the third. Even if you find some illusionistic approach to access many variables, you'll just slow the execution with big amounts of swap commands.

Double nots

In certain cases Flash writes double nots in your code. Consider ActionScript code if (a<=b)

Unknown macro: { ... }

else

Two inversions are created here by Flash compiler:

push 'b'
getVariable
push 'a'
getVariable
lessThan // a>b?
not // now inverted: a<=b?
not // prepare for branch to the else condition: again a>b?
branchIfTrue elseCondition

As you see, Flash is not very flexible compiling your statements and does not change the order of operands in expression or use another pattern for if statement. It doesn't really make sense. The only purpose here could be an attempt to force type conversion to boolean. The next action you always see in the code, however, is branchIfTrue. And this action does type conversion itself.
So Flasm will automatically remove those nots in update mode.

Thanks

My very special thanks go to the people on flashcoders list, whose ideas helped me to the better understanding of optimization and flowed into above examples:

Rasheed Abdal-Aziz, Ralf Bokelberg, Robin Debreuil, Zeh Fernando, Gary Fixler, Branden Hall, Dave Hayden, Damien Morton, Amos Olson, Robert Penner, Casper Schuirink.

Sources

* http://www.nowrap.de/flasm.html

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.