Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The observation that two words are anagrams if they contain all the same letters isn't a huge leap from two words that contain all the same letters are anagrams.

That being said: Sorting isn't required. (An O(kn) solution exists to beat your O(nk log k))



It's not entirely clear that the O(nk) solution actually wins because k is small in this case, and so log(k) is very small. Unless you do some careful bit packing, the O(nk) algorithm will use more memory, which could make it slower because of increased cache misses.


The O(nk) solution has fewer branches (O(1) versus O(log k)), and even if you use a naïve implementation, O(nk) will still fit into L1 (assuming n is bigger than k).


> O(nk) will still fit into L1

Not even close. The lexicon is 230k words, times 26 letters is nearly 6MB. The i7 has a 64kB L1 cache and a 246kB L2 cache.

I think the only way to resolve this would be to actually do the experiment. I wouldn't bet my life savings on the outcome either way, but I'll give you even odds at low stakes that sorting is faster.


Yeah, there are other solutions, such as allocating an array of 26 letters (ignoring accents) and using that to count, but sorting is easy and is generally fast enough.


Sorting is O(kn) where k is the largest number. (See radix sorting)


It also requires O(k) memory which is certainly larger than L1, causing two stalls instead of one.


In this case, k is 26. Or ASCII value of 'z'. Either way, very cheap.


Map a-z to the first 26 primes, multiply and then quicksort your dictionary.


Limited to around 11 character words/phrases. You can do better.


yeah, there's no reason to sort, you can just hash. but I'm not sure I can improve the asymptotic complexity of a comparison




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: