Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Question: is there anything stopping an app from using an invisible embedded web view to pull content from Twitter, mimicking a plain browser, and then scraping the content from the DOM as a poor man's API?

It would be tedious as hell, especially now that Twitter is all-Ajax-all-the-time, and it would be a moving target as they made changes, but could Twitter really do anything to stop it?



I was going to ask that exact question.

I think the answer is no, they can't stop you. And perhaps I'm being naïve, but it doesn't seem like it would be too terribly tedious if you used a scraping library, at least not for replacing basic API functions (i.e. getting a user's recent tweets).

Obviously this would be limited to public tweets (no private tweets, no tweeting on the user's behalf, and no DMs).


"Obviously this would be limited to public tweets (no private tweets, no tweeting on the user's behalf, and no DMs)."

I don't think any of those limitations apply. If you can do something via a web browser, you can do it programmatically, depending on how much pain you're willing to endure.


Sorry, right -- I was just thinking of straight-up scraping of public pages. Asking for the user's password and logging in to do more scraping would probably be possible, but a lot more painful.


> Question: is there anything stopping an app from using an invisible embedded web view to pull content from Twitter, mimicking a plain browser, and then scraping the content from the DOM as a poor man's API?

Yes, it's called iframe busting. You can't force a page into an iframe that doesn't want to be there.


Doesn't have to be inside an iframe; it could be an undisplayed UIWebView, or even a headless process such as PhantomJS.


As soon as they get a whiff that people are doing this they'll just block the ip range. Site scrapping is big big business, and a cat and mouse game. Yes, it can be done though, you're right.


They could do this if the scraping were done by a remote server, but if the network activity came from each individual user, they would at best have to resort to looking for behavioral "fingerprints" to what otherwise looks like normal web browsing activity.

Let's just say I wish I had the time for such an endeavor, and I sincerely hope someone out there does.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: