When you go to a fairly academic conference, it’s frowned upon to award a best in show. Yesterday I attended the Hadoop Summit and expected to hear all the cool stuff Yahoo and Powerset were doing with it. By far, however, the runaway winner for “best use of Hadoop” in my book goes to Facebook. Joydeep Sen Sarma and Asish Thusoo gave a talk on a project called Hive that helps the analysts and engineers at Facebook grok their clickstream and logfile data. Good geeks are, well, geeks. I know many of them. What really impressed me about these two gentlemen and the Hive project was just how business driven it is.
Joydeep started his talk by saying “We asked our current BI [business intelligence] users what tools they could and couldn’t use and they told us they know how to use SQL.” So often technologists forget about their audience. Hive was developed iteratively by a 2 or 3 person team (I think Jeff Hammerbacher was also involved) making it easy for business analysts to ask ad hoc questions of terabytes worth of logfile data by abstracting MapReduce into a SQL like dialect. Think of it as a data warehouse sitting on top of thousands of servers’ logfiles. Beneath the surface Hive leverages Hadoop and translates SQL-like imperatives into MapReduce jobs. It’s really a great use of technology. My highest compliments to the Facebook team for their work in this area.
I’d also like to commend IBM Research for their work on JAQL. It’s essentially a query interface into a JSON data store. It’s really intriguing. Conceptually I love JAQL and think it could be extremely useful. I have concerns about it coming from IBM Research and how open its open source license will be once it gets through IBM legal.
The Hadoop Summit was a great day long event attended by about 400 folks interested in Internet scale computing. It was a pleasant surprise to learn that folks outside of those whom I expected are doing really interesting and innovative work.
One Response to Hadoop Summit – Best in Show
Leave a Reply Cancel reply
Archives
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- March 2011
- January 2011
- December 2010
- October 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- September 2006




[...] http://blog.socrata.com/index.php/2008/03/26/hadoop-summit-best-in-show/ [...]