The observation that two words are anagrams if they contain all the same letters...

lisper · on Feb 21, 2017

It's not entirely clear that the O(nk) solution actually wins because k is small in this case, and so log(k) is very small. Unless you do some careful bit packing, the O(nk) algorithm will use more memory, which could make it slower because of increased cache misses.

geocar · on Feb 22, 2017

The O(nk) solution has fewer branches (O(1) versus O(log k)), and even if you use a naïve implementation, O(nk) will still fit into L1 (assuming n is bigger than k).

lisper · on Feb 22, 2017

> O(nk) will still fit into L1

Not even close. The lexicon is 230k words, times 26 letters is nearly 6MB. The i7 has a 64kB L1 cache and a 246kB L2 cache.

I think the only way to resolve this would be to actually do the experiment. I wouldn't bet my life savings on the outcome either way, but I'll give you even odds at low stakes that sorting is faster.

lilyball · on Feb 21, 2017

Yeah, there are other solutions, such as allocating an array of 26 letters (ignoring accents) and using that to count, but sorting is easy and is generally fast enough.

wrsh07 · on Feb 21, 2017

Sorting is O(kn) where k is the largest number. (See radix sorting)

geocar · on Feb 22, 2017

It also requires O(k) memory which is certainly larger than L1, causing two stalls instead of one.

wrsh07 · on Feb 22, 2017

In this case, k is 26. Or ASCII value of 'z'. Either way, very cheap.

ouid · on Feb 21, 2017

Map a-z to the first 26 primes, multiply and then quicksort your dictionary.

geocar · on Feb 22, 2017

Limited to around 11 character words/phrases. You can do better.

ouid · on Feb 24, 2017

yeah, there's no reason to sort, you can just hash. but I'm not sure I can improve the asymptotic complexity of a comparison