Datasift – Realtime Twitter Query & Curation for Developers
Datasift is getting more exposure this week. Here is Scoble’s interview with Nick Halstead (founder of Datasift, Tweetmeme) – http://www.youtube.com/watch?v=X7aiKaCi8O8
Datasift seems to be a great accomplishment that tackles an enormous challenge in Realtime content flow. Right now, the focus is specifically the Twitter “Firehose” but I imagine that as they scale to handle Twitter they will also try to handle and connect other pipelines (Facebook, Google Buzz, PuSH Feeds etc). Datasift is like a FaaS – Firehose as a Service or more appropriately…. RaaS – Realtime as a Service. Developers/Companies can leverage the Datasift tool (currently a web interface) to power their own services/products (e.g. build a new custom tweetmeme). Certain terms and licensing costs are applicable depending on how Datasift is used.
Datasift is essentially a service that you’d expect Twitter itself to offer but since Twitter is focused on scaling and dealing with scope creeping more fundamental public functionality, Datasift has been able to step in and step up the game of Realtime Data Management. Twitter will settle for the ease of just charging for usage of the raw Firehose like Google and other companies already pay for. I am not sure what deal Datasift has in place with Twitter or other partners. I imagine their being a deal in place that keeps alive potential acquisition offers, maybe after Datasift has proven itself by investing in the necessary hardware and staff to keep such an ambitious project up and running for the hundreds or thousands of services that will surely be lined up to use it.
Datasift has its own API Query Language. It looks like this (taken from Scoble’s video):
twitter.user IN ("scobleizer", "nickhalstead", "whoever")
AND
twitter.text contains "google"
AND
rule "noswearing"
As you can imagine, the flexible query language will allow for a wide array (unlimited?) of data set results as long as that data is a part of a tweet’s object metadata. A tweet of course has a bunch of metadata beyond the “message” content. Datasift was built to leverage this valuable data. It will be even more interesting once Twitter Annotations is officially supported.
Datasift being an API means that developers could also choose to create their own UI for a tool with custom functionality instead of using the official Datasift.net version. Very flexible and open and that’s what happens when you begin by creating an API to provide a Service as opposed to building just the “app” alone.
I’m reminded of a few posts here on vocal.ly that I wrote last year that reflected on Realtime streams and the concepts of Stocks and Flows:
http://vocal.ly/2009/09/18/stocks-and-flows-and-the-real-time-web/
http://vocal.ly/2009/09/03/pondering-the-realtime-web-and-rapid-intelligence-collecting-thoughts/
I believe i signed up a few weeks ago for Datasift so I hope I can at some point gain access to this service for experimentation, research and possibly for use with my own projects. We’ll see how all this evolves over time but I commend the vision and willingness to spearhead this issue like Nick and his team have done. Very cool!
Update:
A good summary of DataSift can also be found here:
http://www.skepticgeek.com/socialweb/datasift-curation-engine-aims-for-relevance-in-real-time/
Related posts:
- Pondering The RealTime Web and Rapid Intelligence – Collecting Thoughts
- Twitter’s Sweet 16 Party – The New Twitter.com and It’s Launch Partners
- Evolutions & Revolutions: Google Reader (Writer?), Facebook, Twitter, Friendfeed, rssCloud
- RSS, Twitter, People, Power And The Negligent Tech Bloggers
- Will Twitter Raise the RSS Shields?
