Looking At The World Through Twitter Data

Anon84 · on May 8, 2012

If you're interested in how information diffuses through social networks like Twitter, take a look at Truthy (one of my projects):

http://truthy.indiana.edu

     Truthy is a system to analyze and visualize the
     diffusion of information on Twitter. The Truthy system
     evaluates thousands of tweets an hour to identify new
     and emerging bursts of activity around memes of various 
     flavors. The data and statistics provided by Truthy are
     designed to aid in the study of social epidemics: How do
     memes propagate through the Twittersphere? What causes a
     burst of popularity?

arashdelijani · on May 9, 2012

sounds like a cool thing to tackle. We'll definitely look at it!

Anon84 · on May 9, 2012

Great. Let me know if you have any questions or suggestions. I'm nearby at Northeastern.

TravisPe · on May 9, 2012

A friend and I started playing around with twitter data back in early 2010. We currently have something close to over 587 million tweets collected (We stopped collecting earlier this year). We only pulled English tweets and those that described what someone was feeling (Im, I am, I feel, I am feeling, etc. along with the negatives I don't feel, I do not feel, etc).

We were able to see some interesting events happen during the time though. This is a graph of the anxiety levels of twitter on March 11th, the bottom axis is the hour of the day EST. The earthquake hit Japan @ 1:46 EST.

http://i.imgur.com/BeBwa.jpg

There is a strange dip around noon that we are unsure of how to account for as our servers did not report any failures.

It was a fun project to play around with.

Anon84 · on May 9, 2012

That type of daily behavior is common in human activities. It matches nicely with typical daily activities like waking up, having lunch, etc... See http://www.bgoncalves.com/component/jdownloads/finish/3/17.h... for an example in web page traffic, http://www.barabasilab.com/pubs/CCNR-ALB_Publications/200805... for cell phone traffic, etc...

cpeterso · on May 9, 2012

> There is a strange dip around noon that we are unsure of how to account for as our servers did not report any failures.

Maybe people are away from their computer at lunch.

What do the blue and green line colors indicate? It would also be interesting to track emoticons. :)

TravisPe · on May 9, 2012

The green line represents the number of tweets that were marked as being anxious and the blue lines represents tweets marked as calm.

You can see that after the tsunami hit there was a general spike in the overall traffic, but a much larger spike for tweets where the user described being anxious.

We also analyzed the tweets for emotions flagging each to either be "happy" or "sad". Don't have the data able to be displayed in any consumable format at the moment though.

These are some logs for the day (totals)

                   Calm   Anxious Happy   Sad
  2011-03-08       2034   8730    77032   94119
  2011-03-09       1349   5129    47708   59406
  2011-03-10       1614   6020    51623   72214
  2011-03-11       4126   20427   87763   126688
  2011-03-12       3251   13009   104434  136389

We had 96 adjectives we used to filter for anxiety and 3242 adjectives we used for emotions (happy/sad).

Permit · on May 9, 2012

Out of curiosity, is Twitter data such as this freely available to anyone, or was this specially acquired for this set of students? I can imagine a number of interesting projects that might arise out of such a data set.

Anon84 · on May 9, 2012

You need to get whitelisted to have access to it. The only problem is that since they partnered with Gnip, Twitter no longer gives whitelist access for free (http://www.readwriteweb.com/archives/twitter_to_sell_50_of_a... ).

I was lucky enough to get it more than two years ago and have been accumulating data ever since.

tmostak · on May 9, 2012

I've also been collecting twitter data for a bit. I developed a heatmapping application that runs on the GPU to produce time-animated heatmaps in real-time for any user-generated query over a Solr database of hundreds of millions of geotagged tweets. You can see a rough demo at http://youtu.be/4_v2EZGiA7w . Hopefully I'll release it as a web app when I get time this summer.

seeingfurther · on May 9, 2012

We've been working on a similar project since last year @ http://smogfarm.com/ Feel free to get in touch!

akshaykarthik · on May 9, 2012

Wow... This is awesome. I actually did a project for my high school science fair that focused on analyzing twitter. It was no where near as sophisticated but it really opened my eyes to the massive amount of data and the availability of commodity hardware that can actually handle terabytes of data.

joejohnson · on May 9, 2012

But, this is because non-English tweets that we have discarded are much more frequent during the night in our time zone, and they often don’t contain the word ‘a’ as often as English tweets do.

This doesn't make sense; are they only discarding the non-English tweets during certains times?

arashdelijani · on May 9, 2012

We just mean that there's more tweeting going on in non-English speaking countries when it's night-time here.

tmostak · on May 9, 2012

Why don't you offer an option to normalize against the number of English-speaking tweets (or any other language you can pick out) over a given hour. You could build your own classifier or use something like Apache Tika. Regardless, really really nice work. The graph is beautiful and highly functional. Did you guys write the graph lib yourselves?

jermaink · on May 9, 2012

Hi, if you like that kind of stuff, I might give you an intro with Peter Gloor, who is author of swarmcreativity.net and at the MIT Center for Collective Intelligence. Tag #Twitter, Stock Prediction, Mood etc. You might meet on campus :)

arashdelijani · on May 9, 2012

We know Peter, actually. He's a great guy and we've been talking to him about this. Thanks though! :)

grout · on May 9, 2012

http://topsy.com/ yeah. we do that.

akg · on May 9, 2012

Interesting data. I would be curious to find out how the general sentiment correlates with consumer behavior, e.g., financial market swings, purchases on amazon.com, google searches, etc.

Anon84 · on May 9, 2012

Twitter Mood Predicts the Stock Market http://www.sciencedirect.com/science/article/pii/S1877750311... (and also in the arXiv http://arxiv.org/abs/1010.3003 )

roarktoohey · on May 9, 2012

It would be cool, possibly profitable, to see stock symbols and their price change mapped vs. mentions of the ticker (like IBM).

tzm · on May 8, 2012

Great work. I'll be following your updates. I'm building a platform for developers to crunch such APIs / data sets..

mrlinx · on May 9, 2012

Is any of this data available? It would be great to have access to it.

christiangenco · on May 9, 2012

A bit off-topic, but what did you use to draw the graphs?

arashdelijani · on May 9, 2012

We used Flot after trying out quite a few other libraries http://code.google.com/p/flot/

molsongolden · on May 9, 2012

My friend Kang and I

Ahhhhhhh