Generally, slow code is long, fast code is longer, day-to-day code is short. The first happens because the person doesn’t know what they’re doing and fumbles about. The second happens, generally, because you’re unrolling loops and setting up caches, and making code that acts more intelligently.
But in the end, code length is unrelated to code speed.
More importantly, code speed is generally unrelated to algorithm. Most of what slows you down is system calls. Knowing what all sorts of algorithms/data structures exist is certainly a good thing, but focus on writing “simple” code and worry about making it cool later.
Included below is a tutorial on optimization:
Running a profiler, of course, doesn’t hurt but there are some problems as well:
- Profilers often don’t display time that is spent in library/system code.
- Once you know where the slow spot is, that still might not tell you why it is slow if you don’t understand what is happening at a lower level.
- Not all languages/environments have an available profiler.
- Oftentimes, optimal solutions require fairly high level changes. Optimizing the slowest portion of code will certainly speed your program up, but an entirely different approach to everything could, even un-optimized, possibly give even greater speed increases.
At the end of the day, having an understanding of the beast inside the box is the best optimization tool. Of course, remember the two principal rules of optimization:
- Don’t pre-preemptively optimize everything. Write clear, readable code and only when the end result doesn’t perform in a reasonable amount of time, go in and optimize. (Of course, when you understand the machine you’ll likely end up writing clear, readable code that naturally runs quite well.)
- 20% of the code will be 80% of what is being run. Optimizing anyplace else will have negligible results and simply eat your coding time.
The Chef, The Manager, and the Pantry
Deep within your computer, there is a single room,[sup]1[/sup] a kitchen. The kitchen has a workbench in the middle and a door on either side. The door in the front goes out to the restaurant (the rest of the computer) and the door in the back goes into the pantry (RAM).
There is a large timer in the kitchen that counts down from 10 microseconds, and when it reaches zero, the chef takes all of his things, goes into the pantry, and falls asleep. As soon as he closes the door though, it reopens to reveal an entirely different chef standing in an entirely different pantry. And this chef comes out, bringing his things with him, to work until the timer counts down to zero again.
To each of the chefs there appears to be only one pantry and it only contains their things. And none of them can go through the front door to the restaurant. Only the Manager has the key to the front door.
Eeach of the chefs has the same basic skills, and frankly this isn’t much. He can add, subtract, multiply, divide, shift bits, AND, OR, XOR, negate, compare two values, move to a different place in his list of instructions, save the result to the pantry, and grab more data from the pantry. Really there isn’t anything else that he can do, so don’t be fooled! And note that he can only work on a maximum of 32 (or 64) bits at a time.[sup]2[/sup] Again, don’t be fooled! A string is not an int regardless of what Python, PHP, or VBScript seems to represent it as to you.
Another thing to remember is that the chef is fat. His fingers are quite dexterous and nimble for most of his mathematical tasks, except for multiplication at which he is a tad slower, and at division for which he is significantly slower. That being said, he still quite fast with numbers. The problem is that since he is a bit overweight, any time he has to go to the pantry, it is not at a brisk run. More a waddle.
Intel is aware of the propensity for chefs to be overweight, so they have given him a shelf to put a good many of his recipe instructions on, and several more shelves to hold all the numbers he is currently working on, all nicely just over his workspace so that he does not have to go back to the pantry so often.
But of course, every time the timer counts down to zero, he has to take his personal stash back to the pantry and place them where they are meant to be, and then one-by-one take them all back out when has been awoken again.
The pantry, now, is a rather interesting place. Most specifically its layout.
Starting at the very bottom on the first shelf, there is the recipe. All of the steps that he needs to follow are nicely laid out for him. But the chef only takes a few hundred or so at a time, setting them up over his workspace, since it is rare that he will need to go that far in a recipe without having to skip to an entirely different section. But every single page of the recipe is locked into a glass case. He can read it just fine, but he cannot modify it.
Moving up the pantry, just past the recipe pages, are various bits and bytes and whatnot, also all locked up in glass cases. The chef can take these to look at, but he cannot modify them because of the glass casing. For instance, when he has an instruction like:
(EXAMPLE A)
String s = “Hello world!”;
The “Hello world!” string will be stored in one of these glass cases. So that you can modify it, the chef must copy the letters into somewhere else first. In the above command, he does just that.
Now, moving further up in the pantry, is a section for items which are not in glass boxes. These are items which are already known to exist before the recipe begins and that will exist until the recipe ends. For instance:
public void do_something() {
static int i = 10;
...
}
Your variable “i”, being a static value, will exist for the duration of everything. So it’s not worthwhile to create a temporary i and copy 10 into it each time. Thus it will be set up in this area of the pantry. Global variables would be here as well.
Now the whooooole middle of the pantry is open space. We might put stuff in there later, but for the moment we’re going to shift our attention to the very end of the top shelf.
…But let’s take a slight detour before we get to that.
Now the sort of work the chef does generally boils down to taking a bunch of ingredients, working a bit of magic, and returning out a single, completed, dish. This dish might then be a component in an even bigger dish, all of which at some point ends up as the total full course meal for all the patrons, but the point is that in general he:
a) Works on small bits of the whole at a time and progressively combines those into the larger whole.
b) For any stage of work, we’re going to have a bunch of ingredients that get processed and merged down into a single item, making all the scraps and waste product immediately trashable.
So realising that this is a decent way to work on stuff, the chef keeps things arranged in a particular way starting from the very end of the top shelf, which we will call the Stack.
Say that I have this function:
int do_something() {
int i = 10;
int j = 42;
return i * j;
}
Well. the chef is only going to need to keep i and j around for a little bit and then he can trash them. So what he does is:
- Puts a bookmark like thing at the top of the stack that will stick out noticeably.
- Creates i and j just after it.
- Determines the result.
- Puts the result in a bowl on his workbench, and then throws away everything on the stack after the last bookmark.
To show how this is handy, we’ll look at a version like this:
(EXAMPLE B)
void main() {
int a;
int b = 2;
a = do_something()
b += a;
}
int do_something() {
int i = 10;
int j = 42;
return i * j;
}
To watch what the chef does:
- Places a bookmark.
- Creates 2 ints worth of room after the bookmark.
- Sets the value of the second-after-the-bookmark to 2.
- Places a bookmark.
- Creates 2 ints worth of room after the bookmark.
- Sets the value of the first-after-the-bookmark to 10.
- Sets the value of the second-after-the-bookmark to 42.
- Brings the first and second to his workbench.
- Multiplies them.
- Puts the result in the result bowl.
- Throws away everything after the last bookmark.
- Takes the contents of the result bowl and sets into into the first-after-the-bookmark slot.
- Takes the first and second items to his workbench.
- Adds them.
- Sets the second-item-after-the-bookmark to the result.
- Tosses everything after the last bookmark.
Now the important thing here is that his recipes can simply refer to local variables as per their distance from the bookmark, and that’s what the page on main() will do, and that’s what the page on do_something() will do.
But the Stack is, obviously, intended for things which have a short lifespan. And since the whole point of a recipe is to build large items via the processing and merging of many many little items, there’s a general correlation between size and lifespan. This is so true that in general compilers simply assume that anything larger than 32 (or 64) bits is likely to have a long lifespan, and so it will not put it on the stack.
And in fact, there is a maximum size that the stack can become, which assumes and necessitates that the above is so.
So if we go back and look at EXAMPLE A, our string is definitely longer than 32 bits. Point in fact, any string will be assumed to be longer and so will not be placed on the stack (by Java.) But of course we need to keep it in the pantry somewhere!
This is where that big empty section in the middle of the pantry comes in. This area is called the Heap.
The Heap does not have a clear method of creating and trashing data. With the stack you just add and remove from the end. But in the heap, something created in the first step of the whole meal might be the first thing trashed, and it might be the last. So the chef has a recipe book for managing the Heap. It is similar to the process of creating and deleting files from a hard drive. Managing it is, in essence, a program itself.
But now think back a little bit. The recipe was written before the chef began to work and the pages are held in glass cases. If page tells him that something is two off from the top stack bookmark, he knows where that is. And if tells him that an item is the 23rd in the read-only section of pre-initialised data, he knows where that is as well. But when the recipe was written, there was no way to know how large the pantry is, nor to know what program this chef happens to use to manage his heap, so there’s no way for the recipe to tell him where anything on the heap is or should go. And since the recipe pages are unchangeable, he can’t write it down there.
So instead what he does, is he has “pointers”. We can think of these as little tokens 32 (or 64) bits in size, with a long tether coming off them. Any time he creates something on the heap, he adds a token to either the stack, the static data area, or his results bowl and ties the other end of the tether to the thing in the heap. He can thus pass the token about without having to be moving large quantities of data all the time, and not overflow the stack size limit.
Right, so that’s the basic layout of the kitchen.
[sup]1[/sup] With multiple core processors, one can now view it as that there are multiple kitchens, though in general, unless a program is multi-threaded it will be fully run on a single core so far as I understand it. Different programs running at the same time will be run on separate cores though.
[sup]2[/sup] Ignoring the floating point or MMX processors or such, since this warning is mostly concerning operations on strings.
The Chef and Branches
As I said way back, one of the very few things the chef can do is jump to some other page or line in his recipe book. This is called “branching”, due to it most often occurring when there is a multiple choice involved.
As noted before, the Chef generally grabs a few hundred instructions from the pantry at a time. When he runs out, El Fatso waddles on back and grabs a hundred more.
But whenever he branches, being sort of stupid, he assumes that the place he is jumping to in the recipe will not be in his personal cache, he always goes back to grab more. Waddle waddle.
This is not awful but, when you really need to put the petal to the medal, doing some odd math rather than setting up a nice if-else -f-else chain might very well speed Tubby up.
Branches:
Trinary operator ? :
If-else if-else
Function calls
Loops
The Chef and Functions
As already mentioned, whenever a function is started, a bookmark is placed, some amount of space created at the end, and variables with values initialized. For each of these steps, the Chef waddles to the pantry, does it, comes back to his workbench, and then goes back all over again. He is really only any good at keeping one task in his head at a time.
Functions which have parameters also have those parameters pushed onto the stack before the function is called. In general, this is done individually. Which, most likely, means that you are going to see the chef waddling back and forth for each time.
To exhaustively go over the process of entering and exiting a function:
- (Bookmark) Place a pointer token to the old bookmark.
- (Bookmark) Place a pointer to the instruction following the function call in the old function.
- (Parameters) Individually place parameters to the new function on the stack. (If the function is declared in an object, there is a hidden parameter which is a pointer to the object’s data in the heap.)
- (Recipe Cache) Fetch the instructions for the function recipe
- (Local Variables) Create space for local variables on the stack
- (Local Variables) Individually initialise each variable
- Function contents →
- (Result) Optionally place result in the result bowl
- (Reset Bookmark) Reset to the old bookmark (current function’s data remains as it is, to be overwritten by the next function to use that space.)
- (Recipe Cache) Send the chef to grab the page for the old function starting just past where it called us.
Overall not a horrendously laborious process, by the chef’s standards, but still quite a rather involved process. Definitely worse than a simple branching.
The Chef, The Manager, and System Calls
If you recall back, the chef has a ten microsecond work window and then he has to trade with someone else’s chef. Now there’s not anything you can do about this in general. But there is a case where this all becomes relevant.
You see, of course, you want to use as much of your 10 microseconds as you can. If you give it up early, you’re already out thousands of instructions you could have done.
A System Call is what happens when the chef needs something from management. This means anything he does that involves anything beyond his own pantry (application space.) Keyboard, mouse, monitor, hard drive, printer, or even other chefs’ pantries! The chef is persona non grata in all of those areas, so far as Management is concerned.
When the chef wants to do something with any of these, he has to ask the Manager to do it for him. So what he does is:
- Allocates memory for the result of the operation on the heap.
- Picks up all of his things and moves them to the stack. (For instance, all of the fancy stuff he does with the stack might never actually reach the pantry since he can virtually do it on his personal shelves and just move stuff there in case the timer goes.)
- Writes a note saying what he wants, with a token pointing to the memory he left open in the heap.
- Turns on a klaxon alarm.
- Goes into the pantry, closes the door, and goes to sleep.
- The manager comes out of the pantry door (his pantry includes everyone elses), turns off the klaxon, and looks at the note.
- The manager goes in and grabs stuff from someone else’s pantry or goes out through the front door and accesses the printer or wherever.
- The manager follows back the heap token and writes the result.
- The manager goes back into his own pantry, closes the door, and goes to sleep.
- Some entirely different chef comes out and does 10 microseconds of work.
- Yet some entirely different chef comes out and works.
- etc.
- Our chef wakes up, opens his door, and starts returning his data and instructions to his work area, and goes back to work from right after where he left off.
Note that all of this also includes the basics of a normal function call as well. Plus, more importantly, is that if you think the Chef waddles…he doesn’t have anything on the hard drive or the printer, or pretty much any other operation except shared memory. These will all be sloooow. Sloooooooooooow.
Hidden Whoopsies
A thing to recall is my warning back at the beginning. The Chef only has a few limited abilities and anything else it may look like he is doing is most likely a function call under the hood.
If, in some language or another, you see code like this:
switch (s) {
case "Arugula":
...
break;
case "Wombat"
...
break;
}
Well, remember, the Chef can’t do string compares. He can only compare values up to 32 (or 64) bits and most likely is processing each byte of the strings individually. This means that you have a loop, which means an instruction cache refresh (waddle waddle), besides the loop itself. And since it’s unlikely that the compiler/interpreter would copy the body of a full string compare loop around multiple times, it’s almost definitely going to be inside a function behind the scenes. So now we have a loop, a buffer refresh, and a function call. Waddle waddle waddle waddle.
While as if you use integer constants, like so:
final static int ARUGULA = 0;
final static int WOMBAT = 1;
switch (i) {
case ARUGULA:
...
break;
case WOMBAT:
...
break;
}
The speed difference is going to be immense. (Plus a typo will turn into a compile time error rather than a runtime error.)
Another hidden whoopsie I recently saw was related:
print "#" * 42 -- repeats a character 42 times
Similar to the above, while it may look like this is a single line of code and so a single instruction for the Chef, internally it will be a function call, loop, and likely a heap allocation or 6.
Remember what the chef can and can’t do and how he would really have to be going about doing your one line of code!
Rankings of Badness
How bad (I think) the various things discussed are.
Don’t want in the deepest loop at all cost
#1 System calls - Slooooooooow!
#2 rand() - Wasn’t discussed, but many people are taken in and think that a pseudo-random number generator would be speedy and end up putting them in critical portions of code.[sup]1[/sup]
Try to minimize where possible
#3 Function calls
#4 Division
#5 Branches
#6 Multiplication
What you want
#7 Everything else
[sup]1[/sup] What I will often do is to cache the result of a random number and peel off as many bits as I need. Usually you only need a number between 0 and 99, for instance. That’s manageable in just seven bits. So if you divide your 32-bit result from rand() by 7, you can get 4 valid numbers from one call (just keep skipping bits left until the lower 7 are between 0 and 99, then skip seven and start again.)