On the Order Of: t3knomanser

t3knomanser

On the Order Of

Jun 04, 2009 16:23

Programmers, when they're really concerned with being efficient, will often start talking about "Big O" or "Order". There's a lot of interesting math behind this, but what it boils down to is a very simple concept- how many operations does it take to solve a certain problem, using a certain algorithm? I'm going to simplify things here, ignore the difference between so called "Big O" and "Little O" and not worry too much about the theoretical underpinnings. This also isn't the only metric used to evaluate how efficient an algorithm is, but it's a major one.

For example, if I hand you a deck of cards, and I tell you to find the Ace of Spades, how many cards would you have to look at in order to find it? Assume you start at the top, and keep drawing cards until you find the one you want. In the best case, the very first card you flip is the one you're looking for. That's on the order of one operation, written as O(1). But, that's the best case. What about the worst? If it were the last card you flipped, that would have taken 52 operations- O(52), or, to be really programmery, O(n), where "n" is the number of cards in the deck. By being general, we can extrapolate this to any collection of cards- be it a full deck, less than a full deck, or even a collection of something else entirely- like an address book.

When talking about the "O" of an algorithm, programmers don't like to worry about constants. We don't like "52"- that's far to specific. "n" is much more useful. But similarly, if there were an algorithm with O(2n+1), we're going to ignore the 2 and the +1 and just call in O(n). Doubling it once just isn't a big enough change to really count. It's details and small stuff, and if "n" is sufficiently large, it just doesn't matter.

Now, searching each individual card in the deck isn't terribly clever, but if the deck is shuffled, that's really all you can do. Picking out randomly, going from top to bottom, bottom to top, it just doesn't matter. But if the deck were sorted- now that could be useful. Think about the phone book- it's sorted, and you don't have to search every single name to find the one you're looking for.

If you were writing a computer program to find a name in the phone book, you'd start by going to the name in the middle. If it's the name you're looking for, great, you're done. If not, does it come before or after the name you're looking for? Once you know that- you've just eliminated half of the names. Take the remaining half, and go to the middle. Repeat the previous process until you find the name you want. Each operation (name you look at) eliminates half of the remaining names. Searching a sorted list is very effeceint, and in this case, it's on the order of "lg n". (lg is the log base 2, or "what power do I have to raise 2 to to get "n"). This, by the way, is called "binary search" and is the goto searching algorithm to use in any search.

In our naive search, where we just checked every element, it's O(n). In a deck of 52 cards, that's 52 operations. Not that bad, but what if I had a list of 1,024 items? That's 1,024 comparisons. But if we used the binary search, it's O(lg n)- 10 comparisons. (210 = 1,024). That's a savings of 1,014 comparisons, in the worst case.

Programmers tend to break down algorithms into a few major categories based on their Big O.

Stuff that's O(lg n) is very fast stuff. They scale really efficiently to big sets of data (1,000 elements is 10 operations, a million elements is 20 operations, a billion elements is only 30 operations!).

O(n) isn't as liked, and most O(n) algorithms can be turned into O(lg n) if you're very clever and find ways to cheat (for example, if you need to search an unsorted list, it'll always be O(n), but if you sort it first, even though sorting is expensive, you'll make it up if you search a lot).

Sorting is really expensive, in comparison. The best sorting algorithms are O(n*lg n). That means, for a set of a thousand elements, it would take 10,000 operations to sort. Ugh, but some problems just can't be made any easier. O(n*lg n) is the best you'll get for sorting algorithms, and there are whole classes of problems that fall into this category.

Worst is O(n2), or even larger exponents. This is the demon of programmers. For example, the first sorting algorithm that most CompSci students learn is the Bubble Sort. It's very simple to understand and implement, but it's an O(n2) sort. But, to sort 1,000 elements, you have to do 1,000,000 operations. Algorithms that run in "polynomial" time are ones that programmers hate. It's worth noting that there are a lot of problems that, as far as we know, can't be solved any faster. There's a whole family of problems that are called "NP Complete", which, right now, can only be solved in polynomial time. Whether or not it's possible to find a better way to solve these problems is an open question.

This was brought up today because I'm working on Gravitone, an iPhone instrument that uses gravity to generate music. I took a very naive approach to the problem, and simply cycle through all the masses and objects in the world and apply gravity to every other object. My algorithm is O(n*m), where n is the number of orbs in play, and m is the number of masses. Sort of polynomial, and if I were to try and scale it up so that the orbs could effect each other, it'd be O(n2). Ugh, polynomial time? That's expensive. Or is it?

I did some research in gravity simulation, and found that the best algorithms out there run in O(n*lg n). The one I looked at specifically was called the "Barnes-Hut algorithm", and it's very clever, and something I'll probably use for a different application. It's also fairly complex to implement, and requires a lot of memory, despite being very fast (memory/speed are a common trade off).

It's faster, but should I use it in this application? No!

In my application, I've capped it at 15 orbs in play and 5 masses in play, which means it's going to take 60 operations.

That's 20 objects total, and in the Barnes-Hut algorithm, 20 * lg 20 ≈ 60 operations.

For very small data sets, sometimes, really expensive seeming algorithms are okay. Now, the Barnes-Hut algorithm would be better- it would scale better, and it would allow me to have the different orbs attract each other, and the masses- it'd be a much more compelling simulation, but that's not the point of the application. I don't need the power offered by Barnes-Hut.

All that said, I think making a Barnes-Hut simulation on the iPhone would be kinda neat. Maybe a game, or a different instrument.

programming, gravitone, iphone