With Hadoop World NYC just around the corner on October 2, 2009, I thought I’d share two pieces of news.
First, I’ve received a 25% discount code for readers thinking about attending Hadoop World. Hurry because the code expires on September 21st. Use: http://hadoop-world-nyc.eventbrite.com/?discount=hadoopworld_promotion_infoworld
Second, a Q&A with New York Times Software Engineer and Hadoop user, Derek Gottfrid. Derek’s doing some very cool work with Hadoop and will be presenting at Hadoop World.
Question: What got you interested in Hadoop initially and how long have you been using Hadoop?
Gottfrid: I’ve been working with Hadoop for the last three years. Back in 2007, the New York Times decided to make all the public domain articles from 1851-1922 available free of charge in the form of images scanned from the original paper. That’s eleven million articles available as images in PDF format. The code to generate the PDFs was fairly straightforward, but to get it to run in parallel across multiple machines was an issue. As I wrote about in detail back then, I came across the MapReduce paper from Google. That, coupled with what I had learned about Hadoop, got me started on the road to tackle this huge data challenge.
Question: How do you use Hadoop at the NY Times and why has it been the best solution for what you’re trying to accomplish?
Gottfrid: We continue to use Hadoop as a one-time batch process for tremendous volumes of image data at the New York Times. We’ve also moved up the food chain and use Hadoop for traditional text analytics and web mining. It’s the most cost-effective solution for processing and analyzing large sets of data, such as user logs.
Question: How would you like to see Hadoop evolve? Or, What are the 3 features you’d most like to see in Hadoop?
Gottfrid: I’d like to see the Hadoop roadmap clarified as well as the individual subprojects to get rid of some of the weird interdependencies so we can get to a legitimate 1.0 release that solidifies the APIs.
Question: What can attendees expect learn about Hadoop from your preso at Hadoop World?
Gottfrid: In my session which I’ve titled “Counting, Clustering and other Data Tricks” I’m planning to take attendees on the journey I’ve gone through at the New York times using Hadoop for simple stuff like image processing to the more sophisticated web analytics use cases I’m working on today.
Question: What are you hoping or expecting to get out of Hadoop World?
Gottfrid: I attended the Hadoop Summit in the Silicon Valley, and now I’m interested to see what people in our eastern region are doing with Hadoop. I’m always open to learning new tricks and tips to better leverage the platform.
I’ll be at Hadoop World to find out how companies are using Hadoop today, and what use cases will pop up in the future.
Will you be there?
Follow me on twitter at: SavioRodrigues
PS: I should state: “The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.”