Hacker Newsnew | past | comments | ask | show | jobs | submit | 2012-07-30login
Stories from July 30, 2012
Go back a day, month, or year. Go forward a day, month, or year.
1.Lying on your resume (steveblank.com)
460 points by ridruejo on July 30, 2012 | 232 comments
2.Lessons in website security anti-patterns by Tesco (troyhunt.com)
352 points by troyhunt on July 30, 2012 | 118 comments
3.Why Apple's new ads look like Microsoft made them. (seanoliver.me)
337 points by seanoliver on July 30, 2012 | 227 comments
4.Ubisoft "Uplay" DRM exposed as rootkit
317 points by rightclick on July 30, 2012 | 136 comments
5.Chaos Monkey released into the wild (netflix.com)
235 points by timf on July 30, 2012 | 36 comments
6.Codecademy now has Python lessons (codecademy.com)
215 points by arjunblj on July 30, 2012 | 56 comments
7.ASCII Google Streetview (tllabs.io)
200 points by divy on July 30, 2012 | 42 comments
8.Sam Soffes open sources Cheddar for iOS (github.com/nothingmagical)
179 points by jamesjyu on July 30, 2012 | 28 comments
9.Show HN: My weekend project using the Soundcloud API (getworkdonemusic.com)
178 points by ryanio on July 30, 2012 | 83 comments
10.Titan, one of Saturn's moons, has an underground ocean (nasa.gov)
178 points by rblion on July 30, 2012 | 40 comments

My startup is essentially an advertising aggregator (pooling traffic from a variety of publishers and routing it to advertisers) and dealing with things like bot detection is a HUGE chunk of what we work on, technology-wise. Let me try and give you an idea of how deep the rabbit hole can go.

- Okay, you want to detect bots. Well, "good" bots usually have a user agent string like, "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)." So, let's just block those.

- Wait, what are "those?" There is no normalized way of a user agent saying, "if this flag is set, I'm a bot." You literally would have to substring match on user agent.

- Okay, let's just substring match on the word "bot." Wait, then you miss user agents like, "FeedBurner/1.0 (http://www.FeedBurner.com)." Obviously some sort of FeedBurner bot, but it doesn't have "bot" or "crawler" or "spider" or any other term in there.

- How about we just make a "blacklist" of these known bots, look up every user agent, and compare against the blacklist? So now every single request to your site has to do a substring match against every single term in this list. Depending on your site's implementation, this is probably not trivial to do without taking some sort of performance hit.

- Also, you haven't even addressed the fact that user agents are specified by the clients, so its trivial to make a bot that identifies itself as "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1." No blacklist is going to catch that guy.

- Okay, let's use something else to flag bot vs. non-bot. Say, let's see if the client can execute Javascript. If so, let's log information about those that can't execute Javascript, and then built some sort of system that analyzes those clients and finds trends (for example, if they originate from a certain IP range).

- This is smarter than just matching substrings, but this means you may not catch bots until after the fact. So if you have any sort of business where people pay you per click, and they expect those clicks not to be bots, then you need some way to say, "okay, I think I sent you 100 clicks, but let me check if they were all legit, so don't take this number as holy until 24 hours have passed." This is one of the reasons why products like Google AdWords don't have real-time reporting.

- And then when you get successful enough, someone is going to target your site with a a very advanced bot that CAN seem like a legit user in most cases (ie. it can run Javascript, answer CAPTCHAs), and spam-click the shit out of your site, and you're going to have a customer that's on the hook to you for thousands of dollars even though you didn't send them a single legit user. This will cause them to TOTALLY FREAK THE FUCK OUT about this and if you aren't used to handling customers FREAKING THE FUCK OUT, you are going to have a business and technical mess on your hands. You will have a business mess because it will be very easy to conclude you did this maliciously, and you're now one Hacker News post away from having a customer run your name through the mud and for the next several months, 7 out of the top 10 results on any Google search for your company's name will be that post and related ones. And you'll have a technical mess because your system is probably based on, you know, people actually paying you what you think they should, and if you have no concept of "issuing a credit" or "reverting what happened," then get ready for some late nights.

I'm seriously only scratching the surface here. That being said, I'm not saying, "this is a hard problem, cut Facebook some slack." If they're indeed letting in this volume of non-legit traffic, for a company with their resources, there is pretty much no excuse.

Even if you don't have the talent to preemptively flag and invalidate bot traffic, you can still invest in the resources to have a good customer experience and someone that can pick up a phone and say, "yeah, please don't worry about those 50,000 clicks it looks like we sent you, it's going to take us awhile but we'll make sure you don't have to pay that and we'll do everything we can to prevent this from happening again." In my opinion this is Facebook's critical mistake. You can have infallible technology, or you can have a decent customer service experience. Not having either, unfortunately, leads to experiences exactly like what the OP had.

12.Live Streaming in Rails 4.0 (tenderlovemaking.com)
141 points by tenderlove on July 30, 2012 | 38 comments
13.Localtunnel: Show localhost to the rest of the world (progrium.com)
135 points by rohshall on July 30, 2012 | 55 comments
14.Google Talk for Developers (developers.google.com)
130 points by ovechtrick on July 30, 2012 | 40 comments
15.NewsBlur (YC S12) Takes Feed Reading Back To Its Basics (techcrunch.com)
131 points by conesus on July 30, 2012 | 38 comments
16.The Heretic - the U.S. government banned medical studies of the effects of LSD. (themorningnews.org)
126 points by username3 on July 30, 2012 | 37 comments
17.Raspberry Pi Myths and Truths (hanselman.com)
126 points by shawndumas on July 30, 2012 | 55 comments
18.Renewables now account for 25% of German energy production (reuters.com)
124 points by geogra4 on July 30, 2012 | 126 comments
19.The New York Times Is Now Supported by Readers, Not Advertisers (nymag.com)
117 points by shrikant on July 30, 2012 | 57 comments

I'm beginning to feel that Apple wasn't a company, it was a man. His employees were just extensions of his brain; they would make what he wanted, and if he didn't know what he wanted they would make every variety they could come up with until they hit it right.

Without Steve, Apple still has all the raw talent they've had for years, there's still so much creativity sitting in that office. But without a lens to distill it, without a final authoritative sign-off, they don't seem to know anymore what is good enough and what is Apple.

I'm willing to give them the benefit of the doubt that they will find it again, but I'm not willing to bet on it. Apple is on track to become just another PC vendor, just another consumer products vendor. There's not much magic coming from Cupertino lately.

21.Chrome treats DELETE requests as GET and caches them (code.google.com)
108 points by eranation on July 30, 2012 | 68 comments
22.And then the music stopped (37signals.com)
108 points by scottshea on July 30, 2012 | 64 comments

Copying the text just incase FB decides they don't like it

---

Hey everyone, we're going to be deleting our Facebook page in the next couple of weeks, but we wanted to explain why before we do. A couple months ago, when we were preparing to launch the new Limited Run, we started to experiment with Facebook ads. Unfortunately, while testing their ad system, we noticed some very strange things. Facebook was charging us for clicks, yet we could only verify about 20% of them actually showing up on our site. At first, we thought it was our analytics service. We tried signing up for a handful of other big name companies, and still, we couldn't verify more than 15-20% of clicks. So we did what any good developers would do. We built our own analytic software. Here's what we found: on about 80% of the clicks Facebook was charging us for, JavaScript wasn't on. And if the person clicking the ad doesn't have JavaScript, it's very difficult for an analytics service to verify the click. What's important here is that in all of our years of experience, only about 1-2% of people coming to us have JavaScript disabled, not 80% like these clicks coming from Facebook. So we did what any good developers would do. We built a page logger. Any time a page was loaded, we'd keep track of it. You know what we found? The 80% of clicks we were paying for were from bots. That's correct. Bots were loading pages and driving up our advertising costs. So we tried contacting Facebook about this. Unfortunately, they wouldn't reply. Do we know who the bots belong too? No. Are we accusing Facebook of using bots to drive up advertising revenue. No. Is it strange? Yes. But let's move on, because who the bots belong to isn't provable.

While we were testing Facebook ads, we were also trying to get Facebook to let us change our name, because we're not Limited Pressing anymore. We contacted them on many occasions about this. Finally, we got a call from someone at Facebook. They said they would allow us to change our name. NICE! But only if we agreed to spend $2000 or more in advertising a month. That's correct. Facebook was holding our name hostage. So we did what any good hardcore kids would do. We cursed that piece of shit out! Damn we were so pissed. We still are. This is why we need to delete this page and move away from Facebook. They're scumbags and we just don't have the patience for scumbags.

Thanks to everyone who has supported this page and liked our posts. We really appreciate it. If you'd like to follow us on Twitter, where we don't get shaken down, you can do so here: http://twitter.com/limitedrun

24.Every iPhone Prototype Apple Ever Made Before They Released The First iPhone (cultofmac.com)
106 points by playhard on July 30, 2012 | 53 comments
25.Show HN: gridster.js, a drag-and-drop grid plugin that actually works (gridster.net)
106 points by dmarinoc on July 30, 2012 | 27 comments
26.Moog Google Doodle open sourced (code.google.com)
104 points by anigbrowl on July 30, 2012 | 12 comments
27.Jonah Lehrer Resigns From The New Yorker After Making Up Quotes (nytimes.com)
99 points by kevinalexbrown on July 30, 2012 | 72 comments
28.Amicus (YC S12) Uses Facebook To Mobilize Volunteers for Nonprofits (techcrunch.com)
101 points by sethbannon on July 30, 2012 | 16 comments
29.Does category theory make you a better programmer? (debasishg.blogspot.nl)
93 points by jamesbritt on July 30, 2012 | 61 comments

I understand that it is challenging for a third party like you to identify the bots. But Facebook has more than enough signal on each user clicking on an ad (friends, posts, location, 'non ad-click activity' to 'ad-click activity' ratio) to determine if that user is a bot. Looks like they are choosing to not do anything with this information. "FB is turning a blind eye to keep their revenue" seems to stand up to Occam's Razor.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: