I'm guessing that: a "document" on Twitter is a single tweet;
a "document" on Facebook is a wall-post or equivalent;
a "document" on GMail is an e-mail;
a "document" on Google Calendar is an appointment.
Therefore, the comparison with Google’s web-wide index in 2001 is a little misleading (in terms of the amount of data), given that the average size of a web-page is greater than all of these.
Of course average size of a file on Dropbox is likely to be larger than a webpage. I wonder what percentage of those 1.5 billion documents are files on Dropbox.
greplin doesn't index content within files on dropbox, just the filenames.
I am building a startup that does that i.e. it indexes your doc/pdf files (more formats coming), and allow you to instantly search through them. It's called grepfiles.com, but is in very early stage (pre-alpha), so go easy on it since I am not sure how well it scales. Mail me at mail@asif.in if you have any feedback. Would really appreciate it.
Therefore, the comparison with Google’s web-wide index in 2001 is a little misleading (in terms of the amount of data), given that the average size of a web-page is greater than all of these.
Of course average size of a file on Dropbox is likely to be larger than a webpage. I wonder what percentage of those 1.5 billion documents are files on Dropbox.