Thursday, April 21, 2011

Counting in loops is bad, m'kay...

So you're working in php, and need to loop over something. While there are many ways to do this using loops such as...

do{
    // do stuff
}while( $x < 10 )

while( $x >= 4 )
{
    // do stuff
}

foreach( $array as $key => $value )
{
    // do stuff
}

for( $i = 0; $i < count( $array ); $i++ )
{
    // do stuff
}

...there are some things you should keep in mind. One of the more common mistakes people make when using loops, is placing unnecessary processes in your loop, such as count().

for( $i = 0; $i < count( $array ); $i++ )
{
// do stuff
}

The problem with using count() in your loops is pretty obvious when you think about it. For each loop, the interpreter has to make these steps:
1) Evaluate your conditional...

$i < count( $array )

2) ...but when the interpreter gets to count( $array ), it has to stop to evaluate because it doesn't know how to compare the integer represented by $i with the function count()...

count( $array ) 

3) ...so that it can make an actual comparison. This way, the interpreter is no longer comparing...

$i < count( $array )

...but rather two integers such as...

1 < 5

So what do we do to fix this? Simple, abstract the count function outside of the loop like so:

$count = count( $array );
for( $i = 0; $i < $count; $i++ )
{
   // do stuff
}

Now that count() is abstracted out of the loop, the interpreter is now able to change...

$i < $count

...in one step, to this...

1 < 5

So what's the point of changing this? Try as much as a 300% speed difference! Don't believe me? Try the code yourself.

3 comments:

  1. Hooray for optimizing compilers?

    ReplyDelete
  2. //cleaned up further
    for( $i = 0, $count = count( $array ); $i < $count; $i++ )
    {
    // do stuff
    }

    ReplyDelete
  3. I don't think there's much explanation here...

    I don't know much about PHP, but presumably the issue here is that evaluating count() takes considerable time. If count() were like len() in python, this wouldn't really make a difference because len() is very very fast for pretty much all built in types. So you cache the value - trade a little memory for a lot of computation time.

    I'm guessing that count actually counts all the elements, making it an O(n) operation. This means that every loop, which has n iterations, takes O(n^2) time. As much as a 300% speed difference? If n is 1, the speed difference is nothing. If the interior of your loop is O(1), then as n approaches infinity, your speed difference will approach infinity. This contrasts to the len() function in python, which is usually just returning a counter index that is built in to most types, making it O(1) and very fast.

    Also, the code is not functionally equivalent. If your count could change within the loop (in unpredictable ways that you can't measure without recounting), then you might want to actually reevaluate count every time.

    ReplyDelete