Literate Jenks Natural Breaks and How the Idea of Code is Lost

randomdrake · on Feb 18, 2013

Very interesting read. As someone who struggled through the implementation of Jenks[1] a few years ago in PHP, I can attest to the fact that it was extremely difficult to find any programming of Jenks that was understandable and clear. I was able to find some examples of code, but nothing could explain what was actually happening and, as you said, there were no comments or poor comments in those examples.

What I was able to find, were a couple of different explanations of the mathematics. Instead of trying to decode some crappy programming implementations, I spent my time teaching myself the math to understand what the representation of the process[2] actually meant. From there, I was able to understand the iterative process involved and dream up a working solution.

Instead of concluding programmers should be better or knowledge should be more free, I came to the conclusion that I just needed to understand the problem and the math behind it better.

Could it be that maybe the assertion regarding the problem of availability is actually just a problem of lack of understanding or desire for proper comprehension? I'm not accusing the author of laziness, merely wondering if a different conclusion and solution could be arrived at with an alternate point of view on the problem.

[1] - https://github.com/randomdrake/jenks - feel free to check out my implementation. I wrote it a few years ago, so it's probably not the greatest, but it's commented, and worked fine and fast.

[2] - http://randomdrake.com/jenks.gif

tmcw · on Feb 18, 2013

Hey - I should have mentioned your implementation - I stumbled upon it and was like "whoah, this guy actually wrote this from scratch" :)

In this case, it's a combination - the algorithm Jenks arrived at is not just the math-implemented, but a clever solution that (afaik) has not been expressed in pure-math terms.

randomdrake · on Feb 18, 2013

Heh, glad you found it entertaining.

This project was very unique and fun for me.

Interesting that you mention the "combination." When I was able to find the aforementioned image that showed the Jenks method, I knew I had to understand what it meant and what the symbols were. I found it very cool that the mathematics were simple and it was the process that made the method work so well.

Basic exponents and algebra were all that were really needed, but the magic was in the method; much like good programming.

The insight into the power of elegant processes wrapping simple mathematics is something I've repeatedly experienced in my programming career. That moment when you hit run and all the data comes out how you wanted it to. It brings upon the realization that your result would only be possible with a true understanding of the process you were implementing. A simultaneous victory and confirmation of comprehension is a good feeling.

Beauty in the bytes.

sklam · on Feb 18, 2013

Here's my implementation of Jenks in Numba: https://gist.github.com/sklam/4979921

It uses numpy array instead of list. Doing so without Numba is a lot slower because numpy array indexing and operating on array scalars are slow.

NelsonMinar · on Feb 18, 2013

Fantastic article. I like to think of this phenomenon with a positive spin though.. I can use the Jenks algorithm without understanding anything about how it works, just plug it in and go. And with a few test cases I can even port it to a new language without really understanding it. I admire Tom's work in digging in and doing it right, but as a journeyman programmer I like that I can just use it without really understanding it. Sometimes cargo cults work.

stcredzero · on Feb 18, 2013

> If you’re a coder, consider whether the abstraction of software can be misused to mask ignorance of basic principles.

One of the problems with Computer Science and Programming, is that such a phenomenon works most strongly within the field.

I also love this quote the author included: The lack of interest, the disdain for history is what makes computing not-quite-a-field. – Alan Kay

kybernetikos · on Feb 18, 2013

I do love the docco style documentation, but I tend to think of 'literate' as a specific thing, http://en.wikipedia.org/wiki/Literate_programming which I don't think this quite matches.

shared4you · on Feb 18, 2013

> In basic benchmarks, it’s 12x faster than a Python implementation

Oh well, my friend, why didn't you use Numpy?

tmcw · on Feb 18, 2013

Numpy would be faster, and so would PyPy as I point out. I'm comparing unoptimized implementations on purpose and not trying to incite some kind of language-speed-flamewar.

stcredzero · on Feb 18, 2013

> I'm comparing unoptimized implementations on purpose and not trying to incite some kind of language-speed-flamewar.

Think about this sentence for a few minutes, then come back so we can all have a good laugh at ourselves and the fragility of human pride.

EDIT: Downvoted? This was not meant with any kind of meanness at all. No matter what choices an author makes with benchmarks, someone will complain. The situation is so catch-22, a laughter is the only effective self defense.

Evbn · on Feb 18, 2013

To avoid sparking flame wars, avoid posting meaningless metrics.

drewda · on Feb 18, 2013

For those who care, here's a previous implementation in Python that I've used: http://danieljlewis.org/2010/06/07/jenks-natural-breaks-algo...

darksaints · on Feb 18, 2013

This is awesome. I feel like I owe you a couple of beers or something.

JoeAltmaier · on Feb 18, 2013

Thanks for working this out of course. But, if the old code produced nice-looking plots, what can be said of the new, except you like the way it reads better?

ynniv · on Feb 18, 2013

This deserves a "Documentation FTW" gold star.