Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

the biggest headache with the c# implementation was the threading. A lot of the out-of-the-box threading structures (pools, etc...) have limitations you might not think about checking for; e.g. you can't set the number of threads lower than the CPU count on the machine with some of the official .net threadPool helpers; you can try, but it will just silently ignore you.

There is some super useful stuff too though that made it easy to write a generic extensible crawler. My implementation ended up supporting separately compiled plugins you could just dump in a 'plugins\' directory, which responded to events and had full ability to manipulate the output pipeline. Do-able in lots of languages, but c# has some formalized helpers around it that make it super easy.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: