Forking is often viewed as a last resort for software projects. However, the growth of GitHub and other distributed version control systems, along with a reluctant acknowledgement from a key Subversion vendor, suggest that forking is going to become commonplace in 2011. Plan ahead to ensure your company is ready for this shift in development methodology.
Centralized version control rules the day, today
Version control systems (VCS) fall into two broad categories – centralized and distributed. The merits of each have been widely debated and are beyond the scope of this post. Here is a good detailed explanation of the differences.
Centralized VCS relies on a central server hosting the main or trusted version of a project, often referred to as the “trunk”. Developers check out and check in code against that central copy of the project. There is only one copy of the entire source code for project, on the central server. Developers on a project can only see a change from another developer once the first developer has checked in her changes into the main trunk.
Distributed VCS on the other hand are designed such that any repository could be considered the “main or trusted version” of the project. Each developer has the entire project’s source code in a local repository on his or her computer. As such, developers on a team can share changes with each other, into each other’s local repository, before merging their changes into a common centralized repository.
The vast majority of version control systems used in public open source projects and internal enterprise software projects are centralized in nature.
Analysis of over 240,000 open source projects tracked by Ohloh demonstrates an overwhelming skew towards centralized VCS usage such as svn, svnsync and CVS. Distributed VCS such as Git, Mercurial and Bazaar account for just 14 percent of usage.
Data from the 2010 Eclipse User survey, which can be used as a proxy for internal enterprise software project usage patterns, reveals a similar skew towards centralized VCS. Distributed VCS usage accounts for just 11 percent of version control systems used by the 1528 respondents of this question in the survey.
|Name||Responses||% of Responses|
|Distributed CVS: Git/GitHub||115||7.5%|
|Distributed CVS: Mercurial||51||3.3%|
|Centralized CVS: Subversion||989||64.7%|
|Centralized CVS: CVS||214||14.0%|
|Centralized CVS: Other||159||10.4%|
Source: 2010 Eclipse User Survey
This data suggests that open source projects are ahead of the curve in adopting distributed VCS. However, the 3 percent difference in usage between the two data sources could be well within the margin of error for each of the surveys.
Suffice to say, distributed VCS are not commonplace in today’s software development practice. But that’s about to change.
Expect growing use of Git/GitHub, Mercurial and the like in 2011
Forrester analyst Jeffrey Hammond tweeted “A sign that git has arrived” linking to a press release from WANdisco. WANdisco is a key vendor behind the centralized VCS open source Subversion project. According to the press release:
“Enough is enough,” said David Richards, President and CEO of WANdisco. “Subversion gets a lot of criticism due to the shortcomings of branching and merging, especially when compared with GIT and others, and we simply don’t have the time to debate whether or not this should be done when it clearly should be.”
As a result, WANdisco will be devoting resources to improving Subversions branching and merging capabilities.
The press release clearly demonstrates that the growth of Git and other distributed version control systems, are raising concern for WANdisco and some of the largest users of Subversion.
GitHub, a Git-based online community for collaborative development counts over 508,000 users hosting 1,524,000 git repositories as of this week.
According to RedMonk’s Stephen O’Grady’s analysis of repository type mentions on Hacker News, distributed VCS account for 86.5 percent of repositories mentioned, and 82.06 percent of the total mentions are for Git alone. As O’Grady explained “this dataset is interesting not because it is representative of developers as a whole, but rather because it’s a community of technologists who are collectively ahead of the curve.”
Prepare for distributed version control in your enterprise
The growth of Git, GitHub and the forthcoming changes to Subversion give IT decision makers a reason to consider distributed version control systems in 2011.
As with any shift in the software industry, decision makers are advised to experiment with a distributed VCS on a small project to gain experience without impacting business critical systems or projects. A small trial project could help identify internal process changes required when shifting from the current centralized VCS to a distributed VCS.
Keep in mind that distributed version control systems provide a complete copy of a project’s source code onto a developer’s local computer. If your current development practice requires that only portions of the source code tree be available to certain developers, then you’ll need to use multiple repositories to represent an overall project and only give developers access to the appropriate repositories.
Additionally, if your developers use laptops and there is a risk of the laptop being lost or stolen, consider that the entire source code to the project is now on the laptop, versus just a branch in your traditional centralized version control system.
Finally, plan for training to help developers familiar with centralized VCS approaches learn how to quickly become productive in a distributed VCS environment.
None of these cautions should be viewed as reasons to ignore distributed VCS in your development environment in 2011.
Get ahead of the curve with distributed VCS. If developer views on GitHub are an indication, your developers will thank you.