Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well variable length codes go way back, Morse is from early 1800s. If you are already using such codes it's a fairly natural question how good you can make them.

Edit: Arguably language itself uses uses variable length codes of letters / phonemes but that might be a more difficult parallel to spot.



There are many places information theory crops up in linguistics (perhaps unsurprisingly).

For example, there's a correlation between the numbers of vowels/consonants in a language and the length of each syllable, time-wise. Languages like English form slow complex syllables, and languages like Japanese have fast simple syllables. Each English syllable conveys more information but takes longer to do so. In the end, English and Japanese have about the same effective number of distinguishable states in the same amount of time -- the same effective bitrate.

There is an uncanny parallel to the trade-off between the symbol constellation size and baud rate with modems.


Interestingly enough, geographies with drier climates tend to develop languages with more consonants and more complex syllables whereas geographies with more humid climates tend to develop languages with more vowels and simpler syllables. Similar bitrate, but different coding strategies for different conditions.


Having made a few algs over the years for myself. (someone nerd sniped me years ago). The one property I always found interesting was the trade off between the dictionary table size and the data. The bigger you make one the smaller the other can be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: