This is a worthy attempt, but the horse has already left the barn. Programmers e...

Homunculiheaded · on Oct 15, 2012

Just as important, nobody actually studying algorithms (where having this misconceptions would matter) actually has these misconceptions. It's completely fine for Big-O to mean one thing when chatting with a bunch of devs and another when sitting down for research.

The distinction between all the Big/Small letters is only really useful if you're trying to prove the bounds and something and proving an upper or lower bound is an easier place to start. Honestly practicing programmers shouldn't even be worried about these type of concerns since in practice there are some very small Ns and very large Cs that can make 2^n surprisingly workable and n surprisingly slow. Speaking in rough terms of 'it grows like this' is just fine.

tylerneylon · on Oct 15, 2012

I am curious to see how many HN-reading developers agree that big-O "really means" or "should mean" an expression of exact complexity (like theta does traditionally). I disagree.

There is an extremely active world of researchers and many (I believe) developers for which big-O definitely and clearly means "upper bound" and not the stronger theta notation. Basically all research papers in CS use O as an upper bound, and those, to me, are the definitive active voices. Beyond that, purposefully confounding O and theta is actually clashing different ideas together - ideas that have been separated for their utility. Often knowing the O notation requires some thought, and knowing theta is even harder; in practice what is most concerning is an upper bound, so that idea is important and useful. If the default object in language became exact complexity (not an upper bound), our lives would be much more difficult -- the practical thing to talk about would end up being whatever new notation replaced the idea of upper bound for complexity (aka the "traditional" big-oh).

sesqu · on Oct 15, 2012

I'm not sure I agree that Θ is more useful than Ο; rather, I think Ο∘E, the expected (time) complexity, is most useful.

Take quicksort for example (see misconception 4). Few people care that it's Ο(n²), and even fewer bother to ask if the implementation is Ω(1). It's not even Θ anything. Yet it's one of the canonical examples of complexity analysis, with the understanding that most people are really concerned with average performance.

btilly · on Oct 15, 2012

Ironic, you just illustrated misconceptions #3 and #4, while trying to educate me about why I was wrong to comment on misconception #2.

All of these notations are just notations for functions, and the functions can have any relation to reality that you want. Here are precise and accurate statements about quicksort using Θ notation. The average and best case performance of quicksort is Θ(n log(n)). The worst case performance of quicksort is Θ(n²).

I maintain that most decent hackers would make those same statements using O notation (which is true), but many would jump to correct someone who said that the expected performance of quicksort is O(n²). According to what is taught in CS, that "correction" is wrong. But given that language is established by usage, they aren't really wrong. And given that arbitrary upper bounds aren't very useful, the common usage makes sense.

Before disagreeing, tell me what value there is in the true statement that the expected run time of all major sorting algorithms is O(nⁿ).

scott_s · on Oct 15, 2012

I can tell you that all of the theorists I talk to say and write O, not Θ when discussing algorithms. I think this is because that the upper bound is generally more interesting, better understood, and more useful when reasoning. In fact, if one of them used Θ, my next question would be "Wait, do we even care about the lower bound?"

I'm not disagreeing with you, just pointing out it's not just hackers. And I think this behavior is because the lower bound generally isn't relevant.

sadga · on Oct 15, 2012

This is why you care about the lower bound:

If I say something is O(n^n)), you may come back and say "I wonder if it is O(n^2)

If you say Omega(n^2), I know not to bother.

scott_s · on Oct 15, 2012

In discussions with a theorist, they would never tell me that something is O(n^n) if it is not also Θ(n^n). My point is not on what people may say, but on what they do say. btilly made a point about programmers, not theorists.

I added the observation that even theorists - the people who are experts in this area - talk like that, and gave a possible reason why. My reason, which perhaps I did not communicate well, is that the lower bound is generally not of practical interest.

omaranto · on Oct 15, 2012

About: "In discussions with a theorist, they would never tell me that something is O(n^n) if it is not also Θ(n^n)". So you've never heard someone say, for example, that Strassen's algorithm shows that matrix multiplication for n x n matrices can be done with O(n^2.808) multiplications?

For some problems, the best known upper bounds don't match the best known lower bounds. Also, in cases where there are further improvements in the upper bounds, it doesn't automatically mean people stop talking about previous upper bounds.

(Ignore this whole comment if you specifically meant the function n^n, I have no examples relevant to that function.)

scott_s · on Oct 15, 2012

I literally meant n^n, which is also what I assumed sadga meant. That is, an upper bound which allows for absurdly fast growth.

I have discussed matrix multiplication's unknown lower bound with theorists, but more as a trivia item because there is a relatively recent result showing that it is O(n^2.3727) (http://www.cs.berkeley.edu/~virgi/matrixmult.pdf). That is, the unknown lower bound was the point of the conversation, and it was interesting because the lower and upper bound are not the same; it's still open research. The other point the theorist made during that conversation is similar to what I've said: getting a tighter bound on matrix multiplication is theoretically interesting, but there's not much practical application.

I was rather talking about when we discuss algorithm when trying to solve problems.

Aardwolf · on Oct 15, 2012

I agree. Nobody is ever going to attempt to write the theta in ASCII or bother typing it given that it's not on most keyboards. Math and practical programming are different worlds, each with their own notations. Programmers are quite used to using ASCII.

Also, if someone says "My algorithm is O(n log n)", doesn't it make sense that he means both the lower and upper bound? Even if technically correct, you wouldn't say "My algorithm is O(n^100)" when it's O(n log n). It's an English language thing...

P.S. I didn't read the article because it's not loading, but can kind of guess the context from your response and noticed this problem before.

B-Con · on Oct 15, 2012

I'm confused by what the article is saying about the meaning of Big-O... FTA:

> Misconception 4: Big-O Is About Worst Case

But it is. Big-O is an upper bound which is by definition a maximum (aka, "worst") case. "f = O(n)" means that at worst "f" will perform linear work, not that it is expected to. He gave this definition himself in the very beginning.

Then he states Misconception #4 and he says that Big-O is not about worst case. He doesn't sound like he is implying that programmers casually use Big-O this way, he seems to be saying this is how the real definition actually is. But it isn't.

tylerneylon · on Oct 15, 2012

Upper bound != worst case, though I can see why the language is confusing.

When we have a timing function f(n) we pretend f has one input, which is n = the size of the input to the algorithm. In reality, it's actually more like f(n, m) where m = a parameter specifying the _exact_ input to the algorithm. (Example: sorting algorithms have many possible input lists of length n.)

So when we write f(n) = O(g(n)), we are trying to make two simplifications: (1) Simplify away the actual input, replaced by only the _size_ of the input; and (2) simplify the actual function f to something that describes f but is much simpler.

I will make up an artificial example that is not perfect but is useful to explain the ideas. Suppose for any input size n we have parameters m in the range [-n-sin(n), n+sin(n)] and the exact timing function

f(n, m) = n + sin(n) + m

Then the first simplification is to determine how to ignore m. We could pick out the best m for n (the smallest f(n, m) for a fixed n), pick out the worst m, or take the average over all m. This is best-case, average-case, and worst-case complexity. For this function, we have:

f(n) = 0 [best case] f(n) = n + sin(n) [avg case] f(n) = 2(n + sin(n)) [worst case]

Then we could get an upper bound on _any_ of these.

f(n) = O(1) [best case] f(n) = O(n) [avg and worst case, not always the same, but they are the same in this example]

If we wanted a lower bound, we could say that

f(n) = Omega(n) [avg & worst case]

because f(n) >= n - 1 for all n (in general there could be more room for flexibility on the right-hand side, but it's not needed in this example).

The point here is that step 1 chooses a simplification across input instances, that we call worse case or average case, etc; and step 2 chooses a simplification in how to write down the function f(n), and this is what we call an upper bound for big-oh, or for theta, both an upper and lower bound.

B-Con · on Oct 16, 2012

Yes. Big-O is the worst case for the input, not for the algorithm. If the input is the expected average, then its Big-O is the upper-bound/worst-case for the average. But if the input is the worst case or just the algorithm itself, its Big-O will be an upper-bound/worst-case for the entire algorithm.

In other words, the original article's point should have been about vocabulary. Big-O is about worst-case, but sometimes it's implied that the Big-O analysis is about the algorithm's average case, not the algorithm's worst case.

btilly · on Oct 15, 2012

Let me try to say it simply.

Big-O is an upper bound on the growth of a function. The function being bounded can (per his item #3) represent anything of interest.

In practice it is common for the function to be the average running time of an algorithm. In this case the upper bound on the growth of that function says nothing about how badly the algorithm might perform upon rare occasion.

His example is quicksort. A naive quicksort sorts n things in on average O(n log(n)) time, but if presented with an already sorted set will take time O(n²).

Another one that I'd offer is a hash. The average lookup time for an element in a standard hash is O(1). The worst case lookup time is O(n).

If I am trying to figure out how long a batch job is that has to do a ton of work, including a lot of hash lookups, I care about the average lookup time, because I do not expect to encounter the worst case.)

If I am trying to construct a hard real-time system that absolutely must complete an operation in fixed time, despite 1000 trial runs that found the hash to be fast enough, the slow worst case scenario should scare me into using a more predictable data structure.

B-Con · on Oct 16, 2012

> Big-O is an upper bound on the growth of a function. The function being bounded can (per his item #3) represent anything of interest.

> In practice it is common for the function to be the average running time of an algorithm.

In other words, Big-O is the worst-case of the input. You do the complexity analysis and then assign it to a Big-O class, the Big-O class represents the worst-case for that given analysis. If you analyze the average, its Big-O class will be the worst-case/upper-bound of the average and you obviously can't expect it to say anything about the actual full algorithm's worst case scenario because, well, you didn't analyze the algorithm's worst-case scenario.

Big-O is worst-case, you just have to know what it's the worst case of. If the input is just "here is the quicksort algorithm", then Big-O would be the upper-bound for the algorithm itself, aka the worst case for the algorithm, namely O(n²).

This seems to be just an issue of convention, does "Big-O" refer to the upper bound on the algorithm or on the average algorithm performance? That's been discussed elsewhere, but it seems like programmers imply the average while the strict definition does not. So my point was that the original article should have just said to be careful about vocabulary, not asserted that Big-O isn't about worst case, since it is.

esrauch · on Oct 15, 2012

It's my understanding that the horse left the barn before Big-Theta was ever invented; Big-O was widely used even in formal papers more loosely and Big-Theta was only in dusty obscure text books until Knuth came along.

mturmon · on Oct 15, 2012

I think you're right. Moreover, the related Landau big-O notation was used in real analysis before the algorithm analysis big-O notation was in use (Landau died a month after Knuth was born, in the thirties).

Related number theorists had notations for big-Omega and little-o as well, but I think big-Theta was pretty much Knuth's idea.

Knuth's summary of the history of these notations is at:

http://www.phil.uu.nl/datastructuren/10-11/knuth_big_omicron...

(Parenthetically, one look at the above link and you know why Knuth was driven to write TeX.)

omaranto · on Oct 15, 2012

I think your "language is defined by usage" is a reasonable position to take, but in my experience this stuff about O really is a misunderstanding: the programmer's I've talked too that use O as Theta when asked what they mean by O will quote the standard mathematical definition, so they really are inconsistent.

anonimo · on Oct 15, 2012

> Given that language is defined by usage

What?! This is not natural language. It's a mathematical symbol with a well-known meaning.

One might argue that a lot of programmers don't need to worry about algorithmic complexity on a daily basis. But anyone who wants to talk about it should get the basics right.

sanderjd · on Oct 15, 2012

Arguing that language of any sort (including the mathematical sort) is not malleable depending on the context of its use is completely futile. This is a perfect example of the more common usage and the more correct usage of a "mathematical symbol with a well-known meaning" being totally at odds with one another.

When confronted with someone using different definitions for their language usage than those you are using you can either argue with them about what the correct usage is or allow that language is not absolute and cooperate to calibrate your terms so that you can understand one another. It seems that btilly has decided to do the latter in this specific case, which is laudable.