News from today’s Hadoop Summit ’09 got me thinking about the importance of open community again.  According to Joey Echeverria’s tweet from Hadoop Summit ’09, Yahoo! representatives feel:

“Yahoo!: Easier to take an open source project and add steam to it rather than write something from scratch. #hadoopsummit09”

What a difference a year and a half makes.  Back in December 2007, there were roughly 10 Yahoo! employees working on Hadoop, and only five or six outside contributors.  Hadoop found said at the time:

“It’s dominated by Yahoo, it would be great for the project to have a more balanced team.”

Today, Hadoop core has 185 contributors, only 30% of whom are Yahoo! employees.

Also, Cloudera, a commercial company aiming to bring Hadoop to the enterprise, has just contributed a new database tool for Hadoop.  The tool, SQOOP, enables users to directly import large database tables into Hadoop.  According to Cloudera founder:

“SQOOP is a tool that enterprise customers were demanding,” Bisciglia said. “Enterprises have lots of data in existing databases, and if you can’t give them a way to interact with that data, Hadoop isn’t as useful as it could be.”

Much like Kernel.org, Apache HTTPD, and Eclipse before it, a meritocratic, open community is unlocking opportunities for the ecosystem, which in turn is helping Hadoop evolve a lot faster than within any one vendor’s corporate walls..

Today, it appears that most of the contributing vendors are collaborators, with little, if any, head to head competition.  That will surely change over time.  But that’s a good thing. More vendors, more developers, more ideas, more innovation.

One can’t help but wonder what Google thinks of Hadoop’s progress.

Cloudera, an open source startup working to expand the use of Apache Hadoop, made two announcements today.  First, it has secured $5 million in Series A funding today.  Second, the availability of the Cloudera Distribution for Hadoop.

What’s Hadoop? It’s a platform for developing applications that can process vast datasets while scaling to the levels that companies like Google, Facebook and Yahoo require.  Hadoop is an Apache project that:

“implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.”

Cloudera sees a market for Hadoop in enterprise situations from analyzing genome and protein data, oil and gas exploration and financial processing.  The Cloudera Distribution for Hadoop is open source and licensed under the Apache Software License 2.0.  Cloudera intends to drive revenue from support and implementation services.  I’ve typically been down on a support or services-based open source business.  However, in the case of Cloudera, this model makes sense, at least for now.  The number of people who can implement a highly scalable application that processes petabytes of independent data relationship using the MapReduce programming model who don’t work for Google, Yahoo, Facebook and the like can probably be counted on two hands.  There is a degree of education and hand holding that Cloudera needs to do while enterprise developers explore writing this style of applications.

Take a look at the investors and it’s easy to predict that Mike Olson and team won’t be independent for long:

In addition to Accel Partners, investors in Cloudera include Mike Abbott (senior vice president, Palm), David desJardins (early Google employee), Caterina Fake (co-founder, Flickr), David Gerster (entrepreneur), Youssri Helmy (entrepreneur), Dr. Qi Lu (president of the Online Services Group, Microsoft; former executive vice president, Yahoo!), Marten Mickos (former CEO, MySQL), In Sik Rhee (former chief tactician, Opsware; founder, Loudcloud), Jeff Weiner (president, LinkedIn; former senior vice president, Yahoo!), Dick Williams (CEO, Illustra; former CEO, Wily Technology), Gideon Yu (Facebook CFO; former senior vice president, Yahoo!; CFO, YouTube).”

All the best to the Cloudera team.