Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think that paragraph perfectly demonstrates why all the commenters on Why I Don’t Talk to Google Recruiters (https://news.ycombinator.com/item?id=13696004) who insist that programmers don't need to know algorithms are wrong. It should be immediately obvious to anyone with a cursory understanding of algorithms that computing all the permutations of every word is much more expensive than necessary, and from there it's not all that hard to figure out that sorting the letters produces the desired effect.


Is it though? There's only 230k words in that list, and most words aren't that long. Building a trie and then trying all the permutations of all of the words seems like it would finish in a reasonable amount of time. Even then O(n^2) approach of comparing pairwise would probably get you to a reasonable time. Especially since this is a one-time thing and doesn't get run over and over.

And your argument is kind of silly when in the very same article when he uses brute force to see how many segments. Shouldn't he have found the most optimal algorithm for that instead of brute forcing it?


Sorting the words is significantly easier in virtually every programming language than testing all of the permutations. So the "it should finish in a reasonable amount of time" isn't a good excuse, since you're doing more work than necessary in order to implement the suboptimal solution.


The observation that two words are anagrams if they contain all the same letters isn't a huge leap from two words that contain all the same letters are anagrams.

That being said: Sorting isn't required. (An O(kn) solution exists to beat your O(nk log k))


It's not entirely clear that the O(nk) solution actually wins because k is small in this case, and so log(k) is very small. Unless you do some careful bit packing, the O(nk) algorithm will use more memory, which could make it slower because of increased cache misses.


The O(nk) solution has fewer branches (O(1) versus O(log k)), and even if you use a naïve implementation, O(nk) will still fit into L1 (assuming n is bigger than k).


> O(nk) will still fit into L1

Not even close. The lexicon is 230k words, times 26 letters is nearly 6MB. The i7 has a 64kB L1 cache and a 246kB L2 cache.

I think the only way to resolve this would be to actually do the experiment. I wouldn't bet my life savings on the outcome either way, but I'll give you even odds at low stakes that sorting is faster.


Yeah, there are other solutions, such as allocating an array of 26 letters (ignoring accents) and using that to count, but sorting is easy and is generally fast enough.


Sorting is O(kn) where k is the largest number. (See radix sorting)


It also requires O(k) memory which is certainly larger than L1, causing two stalls instead of one.


In this case, k is 26. Or ASCII value of 'z'. Either way, very cheap.


Map a-z to the first 26 primes, multiply and then quicksort your dictionary.


Limited to around 11 character words/phrases. You can do better.


yeah, there's no reason to sort, you can just hash. but I'm not sure I can improve the asymptotic complexity of a comparison




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: