Very interesting read. As someone who struggled through the implementation of Jenks[1] a few years ago in PHP, I can attest to the fact that it was extremely difficult to find any programming of Jenks that was understandable and clear. I was able to find some examples of code, but nothing could explain what was actually happening and, as you said, there were no comments or poor comments in those examples.
What I was able to find, were a couple of different explanations of the mathematics. Instead of trying to decode some crappy programming implementations, I spent my time teaching myself the math to understand what the representation of the process[2] actually meant. From there, I was able to understand the iterative process involved and dream up a working solution.
Instead of concluding programmers should be better or knowledge should be more free, I came to the conclusion that I just needed to understand the problem and the math behind it better.
Could it be that maybe the assertion regarding the problem of availability is actually just a problem of lack of understanding or desire for proper comprehension? I'm not accusing the author of laziness, merely wondering if a different conclusion and solution could be arrived at with an alternate point of view on the problem.
[1] - https://github.com/randomdrake/jenks - feel free to check out my implementation. I wrote it a few years ago, so it's probably not the greatest, but it's commented, and worked fine and fast.
Hey - I should have mentioned your implementation - I stumbled upon it and was like "whoah, this guy actually wrote this from scratch" :)
In this case, it's a combination - the algorithm Jenks arrived at is not just the math-implemented, but a clever solution that (afaik) has not been expressed in pure-math terms.
Interesting that you mention the "combination." When I was able to find the aforementioned image that showed the Jenks method, I knew I had to understand what it meant and what the symbols were. I found it very cool that the mathematics were simple and it was the process that made the method work so well.
Basic exponents and algebra were all that were really needed, but the magic was in the method; much like good programming.
The insight into the power of elegant processes wrapping simple mathematics is something I've repeatedly experienced in my programming career. That moment when you hit run and all the data comes out how you wanted it to. It brings upon the realization that your result would only be possible with a true understanding of the process you were implementing. A simultaneous victory and confirmation of comprehension is a good feeling.
Fantastic article. I like to think of this phenomenon with a positive spin though.. I can use the Jenks algorithm without understanding anything about how it works, just plug it in and go. And with a few test cases I can even port it to a new language without really understanding it. I admire Tom's work in digging in and doing it right, but as a journeyman programmer I like that I can just use it without really understanding it. Sometimes cargo cults work.
Numpy would be faster, and so would PyPy as I point out. I'm comparing unoptimized implementations on purpose and not trying to incite some kind of language-speed-flamewar.
> I'm comparing unoptimized implementations on purpose and not trying to incite some kind of language-speed-flamewar.
Think about this sentence for a few minutes, then come back so we can all have a good laugh at ourselves and the fragility of human pride.
EDIT: Downvoted? This was not meant with any kind of meanness at all. No matter what choices an author makes with benchmarks, someone will complain. The situation is so catch-22, a laughter is the only effective self defense.
Thanks for working this out of course. But, if the old code produced nice-looking plots, what can be said of the new, except you like the way it reads better?
What I was able to find, were a couple of different explanations of the mathematics. Instead of trying to decode some crappy programming implementations, I spent my time teaching myself the math to understand what the representation of the process[2] actually meant. From there, I was able to understand the iterative process involved and dream up a working solution.
Instead of concluding programmers should be better or knowledge should be more free, I came to the conclusion that I just needed to understand the problem and the math behind it better.
Could it be that maybe the assertion regarding the problem of availability is actually just a problem of lack of understanding or desire for proper comprehension? I'm not accusing the author of laziness, merely wondering if a different conclusion and solution could be arrived at with an alternate point of view on the problem.
[1] - https://github.com/randomdrake/jenks - feel free to check out my implementation. I wrote it a few years ago, so it's probably not the greatest, but it's commented, and worked fine and fast.
[2] - http://randomdrake.com/jenks.gif