December 2010


Forking is often viewed as a last resort for software projects. However, the growth of GitHub and other distributed version control systems, along with a reluctant acknowledgement from a key Subversion vendor, suggest that forking is going to become commonplace in 2011. Plan ahead to ensure your company is ready for this shift in development methodology.

Centralized version control rules the day, today
Version control systems (VCS) fall into two broad categories – centralized and distributed. The merits of each have been widely debated and are beyond the scope of this post. Here is a good detailed explanation of the differences.

Centralized VCS relies on a central server hosting the main or trusted version of a project, often referred to as the “trunk”. Developers check out and check in code against that central copy of the project. There is only one copy of the entire source code for project, on the central server. Developers on a project can only see a change from another developer once the first developer has checked in her changes into the main trunk.

Distributed VCS on the other hand are designed such that any repository could be considered the “main or trusted version” of the project. Each developer has the entire project’s source code in a local repository on his or her computer. As such, developers on a team can share changes with each other, into each other’s local repository, before merging their changes into a common centralized repository.

The vast majority of version control systems used in public open source projects and internal enterprise software projects are centralized in nature.

Analysis of over 240,000 open source projects tracked by Ohloh demonstrates an overwhelming skew towards centralized VCS usage such as svn, svnsync and CVS. Distributed VCS such as Git, Mercurial and Bazaar account for just 14 percent of usage.

Data from the 2010 Eclipse User survey, which can be used as a proxy for internal enterprise software project usage patterns, reveals a similar skew towards centralized VCS. Distributed VCS usage accounts for just 11 percent of version control systems used by the 1528 respondents of this question in the survey.

Name Responses % of Responses
Distributed CVS: Git/GitHub 115 7.5%
Distributed CVS: Mercurial 51 3.3%
Centralized CVS: Subversion 989 64.7%
Centralized CVS: CVS 214 14.0%
Centralized CVS: Other 159 10.4%
Total 1528 100%

Source: 2010 Eclipse User Survey

This data suggests that open source projects are ahead of the curve in adopting distributed VCS. However, the 3 percent difference in usage between the two data sources could be well within the margin of error for each of the surveys.

Suffice to say, distributed VCS are not commonplace in today’s software development practice. But that’s about to change.

Expect growing use of Git/GitHub, Mercurial and the like in 2011
Forrester analyst Jeffrey Hammond tweeted “A sign that git has arrived” linking to a press release from WANdisco. WANdisco is a key vendor behind the centralized VCS open source Subversion project. According to the press release:

“Enough is enough,” said David Richards, President and CEO of WANdisco. “Subversion gets a lot of criticism due to the shortcomings of branching and merging, especially when compared with GIT and others, and we simply don’t have the time to debate whether or not this should be done when it clearly should be.”

As a result, WANdisco will be devoting resources to improving Subversions branching and merging capabilities.

The press release clearly demonstrates that the growth of Git and other distributed version control systems, are raising concern for WANdisco and some of the largest users of Subversion.

GitHub, a Git-based online community for collaborative development counts over 508,000 users hosting 1,524,000 git repositories as of this week.

According to RedMonk’s Stephen O’Grady’s analysis of repository type mentions on Hacker News, distributed VCS account for 86.5 percent of repositories mentioned, and 82.06 percent of the total mentions are for Git alone. As O’Grady explained “this dataset is interesting not because it is representative of developers as a whole, but rather because it’s a community of technologists who are collectively ahead of the curve.”

Prepare for distributed version control in your enterprise
The growth of Git, GitHub and the forthcoming changes to Subversion give IT decision makers a reason to consider distributed version control systems in 2011.

As with any shift in the software industry, decision makers are advised to experiment with a distributed VCS on a small project to gain experience without impacting business critical systems or projects. A small trial project could help identify internal process changes required when shifting from the current centralized VCS to a distributed VCS.

Keep in mind that distributed version control systems provide a complete copy of a project’s source code onto a developer’s local computer. If your current development practice requires that only portions of the source code tree be available to certain developers, then you’ll need to use multiple repositories to represent an overall project and only give developers access to the appropriate repositories.

Additionally, if your developers use laptops and there is a risk of the laptop being lost or stolen, consider that the entire source code to the project is now on the laptop, versus just a branch in your traditional centralized version control system.

Finally, plan for training to help developers familiar with centralized VCS approaches learn how to quickly become productive in a distributed VCS environment.

None of these cautions should be viewed as reasons to ignore distributed VCS in your development environment in 2011.

Get ahead of the curve with distributed VCS. If developer views on GitHub are an indication, your developers will thank you.

A prediction in 2009 that Ubuntu usage was going to grow in the face of Red Hat’s Linux operating system dominance could easily have been laughed off. Yet, that’s exactly what Ubuntu has been able to pull off, thanks in part to developers and growing adoption of cloud computing.

Developers, ahead of the Ubuntu usage curve
Like many, I was quite surprised by results from the 2009 Eclipse User Survey which found strong adoption of Ubuntu on developer desktops and production servers alike.

Survey respondents selected Ubuntu on their developer desktops over 3 times as much as Red Hat Enterprise Linux (RHEL) and Fedora combined. While surprising, this result could be explained away by the fact that Ubuntu is free and positioned as a user friendly desktop alternative to Windows. On the other hand, RHEL is a for fee product targeted primarily at deployment servers, not desktops.

However, this reasoning failed to explain the strong usage of Ubuntu on deployment servers.

According to 2009 Eclipse survey results, Ubuntu just barely trailed Red Hat on deployment servers with 12 percent versus Red Hat’s 13.1 percent usage on deployment servers.

According to the 2010 Eclipse survey, Ubuntu usage on the developer desktop had increased to 18.3 percent, from 14.5 percent in 2009. Additionally, Ubuntu usage on deployment servers at 12.6 percent usage narrowly beat out Red Hat’s 12.5 percent usage.

In another data point, RedMonk analyst Stephen O’Grady analyzed data from Hacker News consisting of 1.7 million entries. O’Grady explains:

This dataset is interesting not because it is representative of developers as a whole, but rather because it’s a community of technologists who are collectively ahead of the curve.

O’Grady found nearly 10,000 mentions of Ubuntu versus fewer than 2,500 mentions of Red Hat Enterprise Linux or Fedora combined.

As with many recent trends in the IT industry, developers become ambassadors for products they enjoy using and have quickly become an early indicator for enterprise technology usage in the future.

Another key data point that is working in Ubuntu’s favor is cloud computing. And more specifically, the usage of Amazon’s EC2 cloud. O’Grady’s analysis shows over 25,000 mentions of Amazon/AWS (Amazon Web Services). The next closest cloud provider mentioned in the Hacker News data set is Google with it’s App Engine receiving approximately 3,000 mentions.

Canonical bets on cloud early, leaves Red Hat behind
Canonical’s early focus on cloud computing along with its partnerships with open source cloud vendors such as Eucalyptus helped to establish Ubuntu as the de facto Linux distribution for cloud deployments.

Data from The Cloud Market, which tracks Amazon EC2 cloud statistics, highlights the lead that Ubuntu has over other operating systems on EC2.

Take note of Red Hat’s position on the chart, the lowest line at the bottom. Even when Red Hat usage is combined with Fedora, the result still pales in comparison to Ubuntu usage, the highest line in the chart.

Red Hat is well aware of their position in the cloud computing arena and spent much of 2010 making cloud-related announcements in an attempt to close the gap. Judging by the statistics above, Red Hat’s announcements haven’t translated into significant cloud usage as yet. Interestingly enough, even Windows usage, the green line, has far outgrown RHEL/Fedora usage on EC2.

Ubuntu in 2011
In a seemingly perfect storm, Ubuntu is benefiting from strong developer usage, and the fact that developers are increasingly selecting Amazon’s EC2 cloud platform bodes well for continued Ubuntu success on EC2. As that occurs, IT decision makers will need to consider or reconsider Ubuntu for usage within the enterprise.

Rest assured that Red Hat won’t sit idly by during these discussions.

Watching Canonical/Ubuntu and Red Hat engage to win cloud workloads will be interesting to track in 2011. Can the upstart keep up its growth trajectory? Or will the gorilla be able to convert its enterprise market share into cloud workload share?

Follow me on Twitter at SavioRodrigues. I should state: “The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies, or opinions.”



Salesforce.com and several reports suggest that their acquisition of Heroku, a Ruby cloud platform provider, just delivered a large and growing developer audience to Salesforce.com’s door. But did it?

Salesforce.com as a developer destination
Salesforce.com’s recent acquisition of Heroku, along with Salesforce.com’s newly introduced Database.com offering is raising the prospects of Salesforce.com a major platform as a service (PaaS) player for developers. Or at least that’s what Salesforce.com would like you to think. Here’s how Salesforce.com’s CEO, Marc Benioff, described the motivation behind the acquisition:

Ruby is the language of Cloud 2 [applications for real-time mobile and social platforms]. Developers love Ruby. it’s a huge advancement,” said Benioff. “It offers rapid development, productive programming, mobile and social apps and massive scale. We could move the whole industry to Ruby on Rails.

Analyst James Governor of RedMonk, a firm that is very much in tune with developer trends, wrote positively about the acquisition. Governor believes that Heroku, because of its Ruby heritage, will in fact bring developers to Salesforce.com who may have previously looked elsewhere. Governor writes:

Salesforce avoids IT to sell to the business, while Heroku avoids IT to sell to developers. The two firms definitely have something in common. While Salesforce has done an outstanding job selling to line of business people, its direct outreach to developers through its Force.com PaaS platform and “Java-like” APEX language has been disappointing so far. Big Difference then- APEX is “Java-like”. Heroku is Ruby.

Even Engine Yard, a competitor of Heroku, agrees that Ruby is a developer favorite, but dismisses that developers will wish to be tied to Salesforce.com. Tom Mornini, Engine Yard CTO and co-founder, explains:

No respectable developer wants to be on Salesforce.com. This could drive even more developers [to Engine Yard’s platform]…Ruby is the language for the cloud. If you are building apps, and you are building on the cloud, you have to build with Ruby

Ruby’s rise?
For Heroku to deliver Salesforce.com a large and growing number of developers, Ruby’s use should be growing.

According to the Tiobe index of the top 50 programming languages, Ruby’s usage declined in 2010 and has been, at best, relatively flat since 2007. The dark purple line at the bottom of the chart represents Ruby usage.

Next, a search for the terms “Ruby”, “PHP”, and “Python” in job postings on Indeed.com suggest that jobs seeking Ruby skills are in fact increasing. However, Ruby jobs trail both Python and PHP in the actual number and growth rate of the jobs. The dark red line represents Ruby jobs on Indeed.com.

Based on this data, it’s difficult to argue that Heroku truly delivered a horde of developers to Salesforce.com’s door. At least today.

Ruby in the enterprise
For Ruby to in fact become the de facto language of “Cloud 2” as Benioff claims, Ruby needs to be accepted by enterprises.

As much as Saleseforce.com and Heroku may be attempting to avoid IT, as Governor points out, when adoption grows sufficiently large within an enterprise, IT gets involved.

Today, few IT organizations approve of or support dynamic scripting-language based applications. Fewer enterprise middleware vendors could point to substantial businesses selling dynamic scripting language-based solutions to enterprises. While both of these could simply be point in time statements, they both need to be reversed before Ruby can take off in the enterprise.

It would seem that Salesforce.com is betting that it can not only attract developers, but use its brand to win approval for Ruby in the enterprise.

An alternate reality could see Salesforce.com successfully driving Ruby’s fortunes with developers, and entrenched middleware vendors such as IBM, Oracle or Microsoft benefiting when IT organizations begin looking for Ruby-based solutions.

Interesting times ahead for Ruby, Salesforce.com and enterprises alike.

Follow me on Twitter at SavioRodrigues. I should state: “The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies, or opinions.”

Oracle hasn’t won many friends in open source communities since its acquisition of Sun Microsystems and Sun’s array of open source assets including MySQL, Java and OpenOffice. Oracle seems bent on continuing this trend with the growing Hudson open source project.

Oracle controls the Hudson project trademark
The Hudson open source project delivers a leading continuous integration server along with over 300 plugins to support building and testing a wide variety of software projects.

Then Sun employee Kohsuke Kawaguchi, and now a key member of CloudBees, which aims to bring Hudson to the cloud, founded the Hudson project. Kawaguchi worked on Hudson as part of his Sun duties. As such, Sun, and now Oracle, retained ownership of the Hudson trademark and intellectual property.

Kawaguchi remains co-owner of the project, along with Winston Prakash, an Oracle engineer assigned to replace Kawaguchi after his departure from Oracle.

The tension between community and company over project decision making
The Hudson project hosted its developer and user mailing lists and source code on Java.net. However, downtime and reliability issues at Java.net encouraged the Hudson developer community to propose moving the mailing lists to Google Groups. In parallel, Oracle, also unhappy with reliability issues with Java.net, decided to upgrade the Java.net infrastructure and migrate the Hudson project to the new Java.net infrastructure.

Unfortunately, an Oracle email notifying Hudson users and developers of this migration was not received as the sender was not subscribed to the mailing lists in question, and hence the emails were not delivered. As a result, Hudson developers have been locked out of the mailing lists and unable to access or update the source code for over a week.

Frustrated by the inability to access the Hudson source code, Kawaguchi proposed moving the source code to GitHub. Others supported the proposal on the developer mailing list, with no major objections raised for nearly a week. Then, Oracle’s Senior VP of Tools and Middleware, Ted Farrell, wrote the Hudson mailing list to express Oracle’s concerns:

For now, however, we are going to stay on the java.net infrastructure. We believe it is important for Hudson to stay connected with the rest of the java community, as well as take advantage of some of the cool changes we will have coming to java.net. Moving to GIT can be done while staying on java.net. It is not a requirement to move to GitHub.

Because it is open source, we can’t stop anybody from forking it. We do however own the trademark to the name so you cannot use the name outside of the core community. We acquired that as part of Sun. We hope that everyone working on Hudson today will do as they claim to want, and work with us to make Hudson stronger.

When Hudson developers questioned whether the Hudson community had the right to make decisions about where the source code would be hosted, Oracle’s Farrell clarified:

…what I am saying is that I believe the *final* decision of what to do with respect to infrastructure belongs to Oracle and that decision should be made according to the will of the community as it makes sense…We are not prohibiting the developer community from making decisions. In fact we are encouraging that they help form the decisions being made. The decisions just need to be checked with the realities of hosting the community and what is best for the growth of the Hudson ecosystem (eg. syncing with other projects, etc.)

Benefiting from the GitHub community
Oracle, as the owner of the Hudson trademark, is well within its rights to decide where the Hudson project source code and community interactions are hosted. This point, however is lost on the vast majority of Hudson users commenting that the development community should fork the source code under a new, non-trademark encumbered name.

The larger issue is Oracle insisting that simply using Git on the new Java.net is enough to meet the project’s needs; something that the Hudson developer community disagrees with.

Git is based on the notion of being able to fork code and later merge the forked code, along with any changes, back into the mainline code base. Kawaguchi had previously written in support of Git’s fork and merge later approach:

For example, many contributors in the Japanese community hesitate to ask for a commit access, for one reason or another, but they can fork and push changes and send me e-mail all right.

GitHub has become a gathering place for developers, as Hudson project contributor R. Tyler Cory tries to explain to Farrell:

…one of the primary reasons for selecting GitHub instead of one of the many Git hosts such as Gitorious (including Kenai) is the very low barrier to entry for a lot of developers these days. We had considered “self-hosting” the Git service but pooh-poohed that idea in favor of GitHub since having a GitHub account is almost as common as having a twitter handle or Gmail address.

Hosing on GitHub would give the Hudson project exposure to a much larger developer base from which to attract future contributions.

Judging by comments from Hudson project developers, there is little desire to fork the code and found a new project under a different name. However, Oracle’s insistence of keeping the source code on Java.net and not wishing to leverage the larger audience and collaboration found at GitHub leaves the Hudson developer community in a difficult position.

For the sake of the Hudson community, and Oracle’s open source credibility, let’s hope cooler heads prevail and Oracle executives see the benefit of accessing the larger GitHub community.

Follow me on Twitter at SavioRodrigues. I should state: “The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies, or opinions.”