I think that paragraph perfectly demonstrates why all the commenters on Why I Do...

fryguy · on Feb 21, 2017

Is it though? There's only 230k words in that list, and most words aren't that long. Building a trie and then trying all the permutations of all of the words seems like it would finish in a reasonable amount of time. Even then O(n^2) approach of comparing pairwise would probably get you to a reasonable time. Especially since this is a one-time thing and doesn't get run over and over.

And your argument is kind of silly when in the very same article when he uses brute force to see how many segments. Shouldn't he have found the most optimal algorithm for that instead of brute forcing it?

lilyball · on Feb 22, 2017

Sorting the words is significantly easier in virtually every programming language than testing all of the permutations. So the "it should finish in a reasonable amount of time" isn't a good excuse, since you're doing more work than necessary in order to implement the suboptimal solution.

geocar · on Feb 21, 2017

The observation that two words are anagrams if they contain all the same letters isn't a huge leap from two words that contain all the same letters are anagrams.

That being said: Sorting isn't required. (An O(kn) solution exists to beat your O(nk log k))

lisper · on Feb 21, 2017

It's not entirely clear that the O(nk) solution actually wins because k is small in this case, and so log(k) is very small. Unless you do some careful bit packing, the O(nk) algorithm will use more memory, which could make it slower because of increased cache misses.

geocar · on Feb 22, 2017

The O(nk) solution has fewer branches (O(1) versus O(log k)), and even if you use a naïve implementation, O(nk) will still fit into L1 (assuming n is bigger than k).

lisper · on Feb 22, 2017

> O(nk) will still fit into L1

Not even close. The lexicon is 230k words, times 26 letters is nearly 6MB. The i7 has a 64kB L1 cache and a 246kB L2 cache.

I think the only way to resolve this would be to actually do the experiment. I wouldn't bet my life savings on the outcome either way, but I'll give you even odds at low stakes that sorting is faster.

lilyball · on Feb 21, 2017

Yeah, there are other solutions, such as allocating an array of 26 letters (ignoring accents) and using that to count, but sorting is easy and is generally fast enough.

wrsh07 · on Feb 21, 2017

Sorting is O(kn) where k is the largest number. (See radix sorting)

geocar · on Feb 22, 2017

It also requires O(k) memory which is certainly larger than L1, causing two stalls instead of one.

wrsh07 · on Feb 22, 2017

In this case, k is 26. Or ASCII value of 'z'. Either way, very cheap.

ouid · on Feb 21, 2017

Map a-z to the first 26 primes, multiply and then quicksort your dictionary.

geocar · on Feb 22, 2017

Limited to around 11 character words/phrases. You can do better.

ouid · on Feb 24, 2017

yeah, there's no reason to sort, you can just hash. but I'm not sure I can improve the asymptotic complexity of a comparison