Having built scrapers working against some of these measures, you would be really surprised at how often they are accidental. Shared hosting providers often set them as defaults, at least as far as I see from the work I've done.
The real barrier I find is the current case law in the US, which seems to be the jurisdiction of choice for many web companies. It's currently a real possibility that you will be criminally in breach of the law and suffer the cost if you blatantly and knowingly continue after being notified of their ToS. Yes google and other big companies have nothing to fear, but it's pretty much a case of "how many people are dumb enough to pick a fight with mike Tyson?"
If you target your scraping to further your own business, and impinge on someone else's business model, your in water that is currently murky. It really needs to be settled but until another lawsuit rises to the Supreme Court in the US, we won't have that, so it's just a matter of being aware that while your not trying to be an evil criminal, you may still be viewed as such by someone you scrape.
The real barrier I find is the current case law in the US, which seems to be the jurisdiction of choice for many web companies. It's currently a real possibility that you will be criminally in breach of the law and suffer the cost if you blatantly and knowingly continue after being notified of their ToS. Yes google and other big companies have nothing to fear, but it's pretty much a case of "how many people are dumb enough to pick a fight with mike Tyson?"
If you target your scraping to further your own business, and impinge on someone else's business model, your in water that is currently murky. It really needs to be settled but until another lawsuit rises to the Supreme Court in the US, we won't have that, so it's just a matter of being aware that while your not trying to be an evil criminal, you may still be viewed as such by someone you scrape.