TR Editors' blog

What Twitter Learns from All Those Tweets

The company's head of analytics explains how Twitter mines the data users produce.

Erica Naone 09/28/2010

  • 3 Comments

Twitter messages might be limited to 140 characters each, but all those characters can add up. In fact, they add up to 12 terabytes of data every day.

"That would translate to four petabytes a year, if we weren't growing," said Kevin Weil, Twitter's analytics lead, speaking at the Web 2.0 Expo in New York. Weil estimated that users would generate 450 gigabytes during his talk. "You guys generate a lot of data."

This wealth of information seems overwhelming but Twitter believes it contains a lot of insights that could be useful to it as a business. For example, Weil said the company tracks when users shift from posting infrequently to becoming regular participants, and looks for features that might have influenced the change. The company has also determined that users who access the service from mobile devices typically become much more engaged with the site. Weil noted that this supports the push to offer Twitter applications for Android phones, iPhones, Blackberries, and iPads. And Weil said Twitter will be watching closely to see if the new design of its website increases engagement as much as the company hopes it will.

This visualization shows the connections between users.
Credit: Phillie Casablanca

Of course, Twitter also tracks simple statistics, such as how many searches are being performed on its site and where users are located, as well as what domains users link to most frequently. But Weil says the company uses machine learning techniques to figure out what kinds of tweets resonate most with users (this is reposted, automatically, through its "TopTweets" account).

Twitter is also asking some more open-ended questions. Weil said the company is interested in what influences retweets (posts from one user that are reposted by another). And Twitter has discovered that it can make good guesses about the topics a user is interested in by looking at the users he follows that don't follow him back.

Asking such specific questions of huge quantities of data is a common problem for successful Web companies. Weil explained that Twitter benefits from a variety of open-source software developed by companies such as Google, Yahoo, and Facebook. These tools are designed to deal with storing and processing data that's too voluminous to manage on even the largest single machine.

Even so, Twitter sometimes struggles with not having enough hardware. Weil said the company has run out of space in its data center, and that the 100-machine cluster it currently uses to process data is significantly less powerful than what it really needs. Twitter plans to move to a new data center later this year, and he hopes to get three to four times the capacity there.

Weil also said that Twitter is interested in doing more real-time analysis of tweets, but he didn't give details about how the company plans to mine this new trove of data.

Close Comments

To comment, please sign in or register

Forgot my password

luddite

407 Comments

  • 965 Days Ago
  • 09/29/2010

much ado about nothing

Twitter has so much online chatter,
That some mine it for all of its data,
And every nugget of information,
Provides a complete explanation,
As to why none of it actually matters.

Reply

hanshusman

20 Comments

  • 965 Days Ago
  • 09/29/2010

A complex area of possibilities and limitations in practical models and hardware

Yeah I doubt as I think Technology Review pointed at in context of an article in some journal that this approach gives much compared to other similar methods - I mean you do not that way not that big population and it doesn't usually go much better after.

Other ways for sure could give more. Viewing from more dimensions regarding an understanding of human cognition, and be able to use relations of the type P(a | x,...) and so on. Track change and map it to physical energy investment of the person no matter time spent, money bought and so on, as well as the information entropy as expressed direct or through graphs.

That of course would be much more CPU costly and how much it gives general might not be much. For some areas though I am quite sure it could predict
in important areas such as mood correlation to lets say hierarchal expressions political or economics and so on. Areas their we are quite sure such affect but aren't able to follow it very good.

Hard to tell though. Costly in CPU and even more memory for sure, and I very much doubt that Twitter here is more relevant than say media general or letters people send to news paper on issues and blogs, and so on.

Twitter is fast communication and that might suit predicting book and movie sell and such but their other methods do exist. Though it might be a cheap alternative for others of course just walking the published tweets. Would help if it was better sorted perhaps but of course they would like to sell such services so it will probably not come.

Reply

sandeep9

1 Comment

  • 879 Days Ago
  • 12/24/2010

twitter learns

i think Twitter has so much online chatter,
That some mine it for all of its data,
And every nugget of information,
Provides a complete explanation,
As to why none of it actually matters. and he give him lotsop information, as bessiness and other fields so twitter is most important for our people

Reply

About

Insights, opinions, and our editors' analysis of the latest in emerging technologies.

Subscribe to the TR Editors' blog RSS Feed

Advertisement
Advertisement
Advertisement