| 1. | | Lying on your resume (steveblank.com) |
| 460 points by ridruejo on July 30, 2012 | 232 comments |
|
| 2. | | Lessons in website security anti-patterns by Tesco (troyhunt.com) |
| 352 points by troyhunt on July 30, 2012 | 118 comments |
|
| 3. | | Why Apple's new ads look like Microsoft made them. (seanoliver.me) |
| 337 points by seanoliver on July 30, 2012 | 227 comments |
|
| 4. | | Ubisoft "Uplay" DRM exposed as rootkit |
| 317 points by rightclick on July 30, 2012 | 136 comments |
|
| 5. | | Chaos Monkey released into the wild (netflix.com) |
| 235 points by timf on July 30, 2012 | 36 comments |
|
| 6. | | Codecademy now has Python lessons (codecademy.com) |
| 215 points by arjunblj on July 30, 2012 | 56 comments |
|
| 7. | | ASCII Google Streetview (tllabs.io) |
| 200 points by divy on July 30, 2012 | 42 comments |
|
| 8. | | Sam Soffes open sources Cheddar for iOS (github.com/nothingmagical) |
| 179 points by jamesjyu on July 30, 2012 | 28 comments |
|
| 9. | | Show HN: My weekend project using the Soundcloud API (getworkdonemusic.com) |
| 178 points by ryanio on July 30, 2012 | 83 comments |
|
| 10. | | Titan, one of Saturn's moons, has an underground ocean (nasa.gov) |
| 178 points by rblion on July 30, 2012 | 40 comments |
|
| |
|
|
| 12. | | Live Streaming in Rails 4.0 (tenderlovemaking.com) |
| 141 points by tenderlove on July 30, 2012 | 38 comments |
|
| 13. | | Localtunnel: Show localhost to the rest of the world (progrium.com) |
| 135 points by rohshall on July 30, 2012 | 55 comments |
|
| 14. | | Google Talk for Developers (developers.google.com) |
| 130 points by ovechtrick on July 30, 2012 | 40 comments |
|
| 15. | | NewsBlur (YC S12) Takes Feed Reading Back To Its Basics (techcrunch.com) |
| 131 points by conesus on July 30, 2012 | 38 comments |
|
| 16. | | The Heretic - the U.S. government banned medical studies of the effects of LSD. (themorningnews.org) |
| 126 points by username3 on July 30, 2012 | 37 comments |
|
| 17. | | Raspberry Pi Myths and Truths (hanselman.com) |
| 126 points by shawndumas on July 30, 2012 | 55 comments |
|
| 18. | | Renewables now account for 25% of German energy production (reuters.com) |
| 124 points by geogra4 on July 30, 2012 | 126 comments |
|
| 19. | | The New York Times Is Now Supported by Readers, Not Advertisers (nymag.com) |
| 117 points by shrikant on July 30, 2012 | 57 comments |
|
| |
|
|
| 21. | | Chrome treats DELETE requests as GET and caches them (code.google.com) |
| 108 points by eranation on July 30, 2012 | 68 comments |
|
| 22. | | And then the music stopped (37signals.com) |
| 108 points by scottshea on July 30, 2012 | 64 comments |
|
| |
|
|
| 24. | | Every iPhone Prototype Apple Ever Made Before They Released The First iPhone (cultofmac.com) |
| 106 points by playhard on July 30, 2012 | 53 comments |
|
| 25. | | Show HN: gridster.js, a drag-and-drop grid plugin that actually works (gridster.net) |
| 106 points by dmarinoc on July 30, 2012 | 27 comments |
|
| 26. | | Moog Google Doodle open sourced (code.google.com) |
| 104 points by anigbrowl on July 30, 2012 | 12 comments |
|
| 27. | | Jonah Lehrer Resigns From The New Yorker After Making Up Quotes (nytimes.com) |
| 99 points by kevinalexbrown on July 30, 2012 | 72 comments |
|
| 28. | | Amicus (YC S12) Uses Facebook To Mobilize Volunteers for Nonprofits (techcrunch.com) |
| 101 points by sethbannon on July 30, 2012 | 16 comments |
|
| 29. | | Does category theory make you a better programmer? (debasishg.blogspot.nl) |
| 93 points by jamesbritt on July 30, 2012 | 61 comments |
|
| |
|
|
|
| More |
- Okay, you want to detect bots. Well, "good" bots usually have a user agent string like, "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)." So, let's just block those.
- Wait, what are "those?" There is no normalized way of a user agent saying, "if this flag is set, I'm a bot." You literally would have to substring match on user agent.
- Okay, let's just substring match on the word "bot." Wait, then you miss user agents like, "FeedBurner/1.0 (http://www.FeedBurner.com)." Obviously some sort of FeedBurner bot, but it doesn't have "bot" or "crawler" or "spider" or any other term in there.
- How about we just make a "blacklist" of these known bots, look up every user agent, and compare against the blacklist? So now every single request to your site has to do a substring match against every single term in this list. Depending on your site's implementation, this is probably not trivial to do without taking some sort of performance hit.
- Also, you haven't even addressed the fact that user agents are specified by the clients, so its trivial to make a bot that identifies itself as "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1." No blacklist is going to catch that guy.
- Okay, let's use something else to flag bot vs. non-bot. Say, let's see if the client can execute Javascript. If so, let's log information about those that can't execute Javascript, and then built some sort of system that analyzes those clients and finds trends (for example, if they originate from a certain IP range).
- This is smarter than just matching substrings, but this means you may not catch bots until after the fact. So if you have any sort of business where people pay you per click, and they expect those clicks not to be bots, then you need some way to say, "okay, I think I sent you 100 clicks, but let me check if they were all legit, so don't take this number as holy until 24 hours have passed." This is one of the reasons why products like Google AdWords don't have real-time reporting.
- And then when you get successful enough, someone is going to target your site with a a very advanced bot that CAN seem like a legit user in most cases (ie. it can run Javascript, answer CAPTCHAs), and spam-click the shit out of your site, and you're going to have a customer that's on the hook to you for thousands of dollars even though you didn't send them a single legit user. This will cause them to TOTALLY FREAK THE FUCK OUT about this and if you aren't used to handling customers FREAKING THE FUCK OUT, you are going to have a business and technical mess on your hands. You will have a business mess because it will be very easy to conclude you did this maliciously, and you're now one Hacker News post away from having a customer run your name through the mud and for the next several months, 7 out of the top 10 results on any Google search for your company's name will be that post and related ones. And you'll have a technical mess because your system is probably based on, you know, people actually paying you what you think they should, and if you have no concept of "issuing a credit" or "reverting what happened," then get ready for some late nights.
I'm seriously only scratching the surface here. That being said, I'm not saying, "this is a hard problem, cut Facebook some slack." If they're indeed letting in this volume of non-legit traffic, for a company with their resources, there is pretty much no excuse.
Even if you don't have the talent to preemptively flag and invalidate bot traffic, you can still invest in the resources to have a good customer experience and someone that can pick up a phone and say, "yeah, please don't worry about those 50,000 clicks it looks like we sent you, it's going to take us awhile but we'll make sure you don't have to pay that and we'll do everything we can to prevent this from happening again." In my opinion this is Facebook's critical mistake. You can have infallible technology, or you can have a decent customer service experience. Not having either, unfortunately, leads to experiences exactly like what the OP had.