"Premature optimization" is another hoary old argument that gets dragged into th...

lallysingh · on May 4, 2009

Small C programs start up faster than HLL programs. Small C programs run faster.

When they get bigger, they quickly bloat up to compensate for how hard C makes it to write large-scale software. Oh, and it takes a lot longer to modify those C programs, for the same reasons.

C's malloc() time is also a lot slower than GC'd languages. For them, a memory allocation isn't much more than a subtraction.

tptacek · on May 4, 2009

First, cite sources for specific cases where a mainstream malloc() is "a lot slower" than a specific GC'd allocation in a mainstream HLL. This rings more truthy than true to me.

Second, fixing malloc slowness is among the easiest and fastest optimizations you can make in a C program (in most cases, a pool and freelist will get you 90% of the way there), and no GC'd allocator is faster than pool and arena allocation (which is usually just a couple ALU operations in both the alloc and free cases).

Third, large C programs tend to start up faster than large HLL programs. For instance, Eclipse is slower to start for me than Firefox --- and if you're thinking of Firefox as your archetypal "big bloated C++ program", remember that Firefox is by design the most complicated application most people ever run.

To make that argument work, you have to find a HLL program of comparable complexity. You didn't do that; you just imply that one exists. I'm calling you out on that.

jjs · on May 4, 2009

> First, cite sources for specific cases where a mainstream malloc() is "a lot slower" than a specific GC'd allocation in a mainstream HLL. This rings more truthy than true to me.

> Second, fixing malloc slowness is among the easiest and fastest optimizations you can make in a C program (in most cases, a pool and freelist will get you 90% of the way there), and no GC'd allocator is faster than pool and arena allocation (which is usually just a couple ALU operations in both the alloc and free cases).

In a semispace or generational GC, allocating memory is just bumping a pointer.

tptacek · on May 4, 2009

You're not counting the cost of deallocation.

lallysingh · on May 5, 2009

Not an easy thing to count globally. Your GC cost (at least in mark-sweep) is dependent on the number of objects you're not deallocating. The rest are implicitly destroyed.

tptacek · on May 5, 2009

You can't decouple the cost of malloc() from the cost of free(); if malloc has to do any more work than simply bumping a pointer (or, in the general case, grabbing a mutex and then bumping a pointer), it's a concession to free() (or to defragmentation, a side effect of free).

jjs · on May 5, 2009

Semispace (and generational) collectors do decouple the cost of allocation from the cost of collection by only considering the set of live objects. They also compact (defragment) as a side effect of collection.

brl · on May 5, 2009

In the case of a minor collection, you don't do a full mark-sweep. You only look at a small fraction of the live objects. Generally you only need to trace objects in the youngest generation which are reachable from GC roots. Since you expect most young generation objects to be eligible for collection, this is extremely fast.

It is unusual for a lot of references to exist from older generations to the youngest generation, but when this occurs there are data structures for detecting and resolving these references very efficiently.

jjs · on May 5, 2009

In the collectors I mentioned, deallocation is a matter of moving the live objects to the other semispace, and considering the "current" semispace to be garbage.

If you're allocating a lot of short-lived objects, then explicitly keeping the live objects is much more efficient than explicitly culling the dead objects.

jdvh · on May 4, 2009

> just bumping a pointer

If only. You have to check for out-of-memory, and throw an exception if so. (This check can sometimes be optimized away, though). You also have to mark the size of the allocated block, so the garbage collector knows how many bytes to copy during the sweep phase.

Also, garbage collectors usually allocate from a shared memory pool, so every allocation also involves some mutex operations.

Allocation is bumping a pointer in theory only.

brl · on May 5, 2009

> You have to check for out-of-memory, and throw an exception if so.

You have to check if there is space for the allocation in the current space. This is two native instructions, a compare and a conditional branch. If the allocation fails, you don't throw an exception, you launch a minor collection.

> You also have to mark the size of the allocated block

Yes and no. When an object is initialized, type information is recorded that contains the size. I don't think that the Java VM explicitly stores the size of the allocation somewhere. It certainly doesn't need to since the same information is available by just looking at the object itself.

> Also, garbage collectors usually allocate from a shared memory pool, so every allocation also involves some mutex operations.

In the Java VM this problem is addressed by dividing the Eden space into chunks where each chunk is private to a single thread.

> Allocation is bumping a pointer in theory only.

An object allocation in Java is around 10 assembly instructions. More than just 'bumping a pointer' but not a lot more.

viraptor · on May 5, 2009

Actually firefox is a large hll program. It's a javascript / xul engine in a large part. The frontend of firefox is actually javascript AFAIK. Maybe someone has more precise information?

1 and 2 were already commented on, so I'll just add that, there was a proof that a copying gc with enough memory can be faster than manual allocation. http://www.cs.umass.edu/~emery/pubs/04-17.pdf

tptacek · on May 5, 2009

The paper you're citing doesn't contain the evidence you claim it has, and in GC vs. "manual" studies, you have to make sure they're not comparing GC to malloc(). Malloc has design goals other than speed. Performant code very often replaces malloc.

viraptor · on May 5, 2009

You're right - I made a stupid copy&paste mistake with both documents open :/

This is the link I was thinking about: http://www.cs.ucsb.edu/~grze/papers/gc/appel87garbage.pdf

tptacek · on May 6, 2009

Doesn't this paper basically say:

* Garbage collection is faster than manual frees when you provide processes with so much memory that collection happens so rarely that its cost is lost in the noise, and

* Garbage collection is faster than manual frees when you ignore all the constant factors in both collection and manually freeing, for instance the cost of taking page faults?

viraptor · on May 6, 2009

Yes - that's true.

But it also gives you additional information: how to check if the cost can be ignored and whether it will be lost in the noise. And that's important, because in serious VMs you can often control the GC parameters. That means you cam make sure that those asymptotes are as close to reality as possible for your exact problem.

It's the same story for hashed collections. You can find big O costs for their operations, but they're amortised costs. Sure - once in a while you have to realloc / rehash / copy the table depending on the implementation - but then you know how to judge if that collection is ok for you. You can modify the starting size exactly in order to lose the rare cost in the noise. The trick is knowing the limits.

Yet somehow many people who love hashmaps hate GC...

You're right about what the paper says, it just didn't spell out the reason why that info is useful ;)

brl · on May 5, 2009

Just from the title alone, I can tell I'm going to like this paper :)

stcredzero · on May 4, 2009

"Premature optimization" is just another way of saying "use the right tool for the job." Look at Mercurial. It's not quite as fast as Git, but it's just fine for its purpose. Most of it is written in Python, with some speed critical parts written in C. This strikes me as using the right tool for the right job at the right time. Write most of the app in something like Python. This makes it easier to write concise, understandable code that reads like pseudocode in a short time. When you have the system running correctly and get it factored in a way you find nice, then rewrite the speed-critical parts of it in C.

Having a running model of your system to profile is one of the best tools you can have for optimization.

tptacek · on May 4, 2009

As true as that is, it's orthogonal to the actual debate at hand. There are plenty of good reasons to pick Python, or Java, or Scala, over C/C++. But this article claims performance is one of those reasons, and then crafts a particularly leaky argument to carry that claim.

stcredzero · on May 4, 2009

I read the article more like: "Writing a program in C is no guarantee of having a fast program. Here are a few examples." (Of both places where C is not the best for speed, and other languages giving good results.)

I think your idea of "the actual debate at hand" is too narrowly focused. As a reader of HN, I'm generally interested in how to get things done in a reasonably efficient manner. I find that "get it running, profile it, understand it, then optimize" is a good thing to think about when discussing performance. It cuts through a lot of delusions people have about their ability to write fast code. (Especially when they try to do it up-front.)

vegai · on May 5, 2009

And notably, Mercurial written in python is much faster than Subversion written in C.

tptacek · on May 5, 2009

Apples and oranges much?

vegai · on May 7, 2009

Yes. Apples also can be compared to oranges by several different criteria.

vorador · on May 5, 2009

Remember that Mercurial is a dvcs.

stcredzero · on May 5, 2009

That's an example of how architecture trumps language.

anamax · on May 5, 2009

> C programs start up faster than HLL programs. They run faster. They consume less memory. They tend to be more responsive.

That's working C programs vs working HLL programs. And, by those metrics, assembly language programs are better still.

> Hey --- I write most of my code in Ruby.

Which suggests that performance of working programs is not the only important factor.

Yes, "premature optimization" is arguably the wrong term for making that suggestion, but it is quite valid.

DeCarabas · on May 5, 2009

The data in the article mentions that the startup time for the C, C++, and OCAML programs were in the noise. So there's at least one data point which suggests that HLL programs do not necessarily start slower than C programs.