Truthy is a system to analyze and visualize the
diffusion of information on Twitter. The Truthy system
evaluates thousands of tweets an hour to identify new
and emerging bursts of activity around memes of various
flavors. The data and statistics provided by Truthy are
designed to aid in the study of social epidemics: How do
memes propagate through the Twittersphere? What causes a
burst of popularity?
A friend and I started playing around with twitter data back in early 2010. We currently have something close to over 587 million tweets collected (We stopped collecting earlier this year). We only pulled English tweets and those that described what someone was feeling (Im, I am, I feel, I am feeling, etc. along with the negatives I don't feel, I do not feel, etc).
We were able to see some interesting events happen during the time though. This is a graph of the anxiety levels of twitter on March 11th, the bottom axis is the hour of the day EST. The earthquake hit Japan @ 1:46 EST.
The green line represents the number of tweets that were marked as being anxious and the blue lines represents tweets marked as calm.
You can see that after the tsunami hit there was a general spike in the overall traffic, but a much larger spike for tweets where the user described being anxious.
We also analyzed the tweets for emotions flagging each to either be "happy" or "sad". Don't have the data able to be displayed in any consumable format at the moment though.
Out of curiosity, is Twitter data such as this freely available to anyone, or was this specially acquired for this set of students? I can imagine a number of interesting projects that might arise out of such a data set.
I've also been collecting twitter data for a bit. I developed a heatmapping application that runs on the GPU to produce time-animated heatmaps in real-time for any user-generated query over a Solr database of hundreds of millions of geotagged tweets. You can see a rough demo at http://youtu.be/4_v2EZGiA7w . Hopefully I'll release it as a web app when I get time this summer.
Wow... This is awesome. I actually did a project for my high school science fair that focused on analyzing twitter. It was no where near as sophisticated but it really opened my eyes to the massive amount of data and the availability of commodity hardware that can actually handle terabytes of data.
But, this is because non-English tweets that we have discarded are much more frequent during the night in our time zone, and they often don’t contain the word ‘a’ as often as English tweets do.
This doesn't make sense; are they only discarding the non-English tweets during certains times?
Why don't you offer an option to normalize against the number of English-speaking tweets (or any other language you can pick out) over a given hour. You could build your own classifier or use something like Apache Tika. Regardless, really really nice work. The graph is beautiful and highly functional. Did you guys write the graph lib yourselves?
Hi, if you like that kind of stuff, I might give you an intro with Peter Gloor, who is author of swarmcreativity.net and at the MIT Center for Collective Intelligence. Tag #Twitter, Stock Prediction, Mood etc. You might meet on campus :)
Interesting data. I would be curious to find out how the general sentiment correlates with consumer behavior, e.g., financial market swings, purchases on amazon.com, google searches, etc.
http://truthy.indiana.edu