Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting article! The metric you use does eliminate triviality, but it sometimes uses very obscure (and arguably uninteresting words), such as calumnies, ivoriness, coprophagist, etc. That's what you describe as Webster's Second jargon that nobody knows".

It would be interesting if you could adapt your metric to account for general prevalence of the word in English. Scan a giant subsection of say Wikipedia, and assign a frequency to each of the 234,000 words in a map, giving unseen words an infinitely small frequency, and then use the sum or multiple of the frequencies of each of the anagrams to bring out some truly interesting ones!



I would strongly argue that coprophagist is a very interesting word, and should be less obscure. But then again, that might just be my juvenile sense of humor.


Calumnies and coprophagist aren't particularly obscure. I've come across both.

I think you have to discriminate between slightly obscure or archaic words that anyone familiar with a reasonable range of the literary canon would know, and truly uninteresting words that even a highly educated and well-read person wouldn't know.

There are better corpuses than Wikipedia that could be used for this purpose, like the British National Corpus

http://www.natcorp.ox.ac.uk/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: