A Version Control Primer for the Rest of Us
- Written by Alexander Riedel
- Last Updated: 03 May 2017
- Created: 12 April 2017
- Hits: 1438
When a developer starts a new job, he or she is usually set up with a workstation, the current version of the development environment that is being used (Visual Studio, Android Studio, XCode etc.) and whatever version control system the company or department uses.
An admin is usually told to bring his own laptop and a sandwich and use notepad or something else that is free.
I exaggerate of course, but from some of the comments I hear, it is not that far off. A version control system is not something that really occupies the minds of IT management, so frequently the individual admin is left to his or her own devices.
So, let's examine what questions and decisions will influence your choices. Of course I will use our own Version Control System, VersionRecall, to illustrate some of the topics. Don't let this discourage you from reading to the end. All of these questions will apply to any system you will ultimately choose to use.
1. What is Version Control?
Version Control, also referred to as Source Code Control Management (SCCM), generally stores subsequent versions of source code files so you can access a history of changes. It commonly allows you to restore previous versions of any submitted file, in case a change broke functionality or a file became corrupted. Additionally, it also keeps a record of who changed what in a team environment. Depending on the system used, added comments on submit serve as a reminder for yourself and other team members what changes were made at the time.
Most modern version control systems also support binary files, which allow you to submit supporting files, e.g. images, icons, data files, along with your source code.
Additional features to look for are release management, which allows you to specify milestones and mark specific file versions for a project, and autonomous submits, which ensure files are submitted even if you forget.
2. Why do I need version control?
If you ever worked on a new version of software for weeks and then heard a clunk and a hard drive spinning down, you know what a horrible feeling that is. With modern SSDs you don't hear the noise, but the dread is the same. Back when version control systems were invented (IBM, ca. 1962, SCCS 1975) hardware failure and human error were the main factors that could corrupt or lose source code.
Today you have a few more threats that can ruin your day, ranging from malware, ransomware, laptop theft or simply forgetting to pick up your notebook or tablet after airport security. And...
yes, that happens.
Try explaining to your boss that the new features you have been bragging about in meetings all summer are all lost, because you clicked on a link in that one email from your friend.
Yeah, that friend…
So the question is more like "How can you not need version control?"
3. How is my code stored?
Depending on your system of choice, your code is stored in binary files of some proprietary format or in some type of database, e.g. SQL Server in the case of Microsoft's Team Foundation Server (TFS). Some systems employ a forward delta scheme, others use reverse delta or store your files complete within their proprietary file format. Forward delta means that the initial version of a file is stored and for subsequent versions only the differences are stored. Reverse delta means that your latest version is stored in its entirety and only the differences are stored for previous versions.
Difference 2 -> 1
Difference 1 -> 2
Difference 3 -> 2
Difference 2 -> 3
Difference 4 -> 3
Difference 3 -> 4
The forward or reverse delta methods were devised when diskspace was at a premium. Since most daily source code submissions are identical except for the few lines you added, deleted or modified, storing only the differences saves a lot of diskspace.
Of course, that comes with a performance penalty as you might imagine. Forward delta schemes slow down over time as it takes longer and longer to generate the last submitted version so it can be compared to the current version you want to submit. Naturally, retrieving old versions becomes faster and faster as you move towards the beginning of a file's history.
Reverse delta storage is very fast when submitting a new version as the previous submit is stored completely and can easily be used for comparison. Access to previous versions slows down as you walk down memory lane, because more and more versions (deltas) need to be applied.
Nowadays diskspace is cheap, so the need to scrounge for every single byte is no longer necessary. That is why VersionRecall keeps every individual version as a complete file instead of only storing differences. If you really run out of disk space, getting a bigger hard drive or SSD is generally not a cost factor anymore.
There is obviously no performance penalty when submitting or retrieving a version, since all files are stored entirely. So, access time is the same for the current version or the very first version of a file, no matter how far apart.
4. What is this 'repository' you speak of and why would you request someone to pull something?
The terminology used by the various version control systems can be confusing. It may be perfectly clear to your developer friends what a sandbox and repository are. You hear people throw terms around like submit, commit, check-in, pull request, push, diff or merge. Each version control system uses its own vocabulary and that can make things even more challenging.
Once you have decided on a system you will need to learn the proper words for things, but for now we will start with some generic terms.
A 'sandbox' is your local folder where you keep your source files. So, that is basically where they have always been. If you edit your files on a share on a file server, you may want to reconsider that because it does not make much sense in the context of a version control system. The term stems from the notion that the 'sandbox', i.e. your local drive, is where you 'play' with your code until it works. Then you 'submit' that code to a 'repository', which is the remote storage for your code. Usually the words chosen for version control operations have a directional quality to them that makes it fairly easy to determine what they do. Submit, commit, check-in, push indicate an operation towards the repository, whereas get, check-out, get latest, restore are names used for an operation outwards from the repository.
Just know that some systems can break that rule so when in doubt, look it up. Additionally, some systems like Git use a two-step process, commit and push, to apply changes to the remote repository.
The reference to the pull request in the title of this section is a play on Git's pull request. It basically informs other users that you submitted changes to the repository and you want them to get that latest change.
5. Where should I store my code?
For obvious reasons you should keep your version control repository on a different physical drive than your source code. But where should you put it? Depending on your needs and your chosen version control system you have a few choices.
If your version control system requires a server component you are somewhat limited in your choices, but generally you can still choose between running your own local server and using a cloud based service. A local Git server versus using GitHub is a good example of this.
Version control systems like VersionRecall, which do not require a dedicated server component, give you a few more choices. You can put your repository on a NAS, an external USB drive or a file server in your corporate environment. You can also put the repository in a folder that is
replicated into cloud storage, e.g. DropBox. If you work with multiple computers this is also a great way to have a synchronized repository between machines.
With the new threat of ransom ware, which basically will encrypt all files within reach until you pay up, you must also employ some type of offline storage strategy. Depending on your version system you need to revert to a traditional offline backup medium or you can use whatever mechanism is offered.
VersionRecall is designed to operate with any repository in reach and you can always select another one manually at any time. This makes it fairly easy to plug in a USB drive, select the drive and folder for a repository and submit. Then you unplug the drive again and toss it back into the drawer.
The important part is that you have a physically disconnected alternate storage medium for your source code.
6. How many versions do you need?
It is pretty easy today to turn into a digital pack rat. How many version of your source code files and auxiliary files you really need to keep depends on a few factors. For one, it very much depends on how often you submit changes. Whether you submit changed files once every hour or once a day makes a huge difference. Secondly, your company's policies will also influence that decision. If you are working for a government agency you may be subject to their requirements.
Generally, it is a good idea to keep about six months' worth of versions. I have personally only once had a case where I needed to retrieve a single file from a version control system older than that.
Take a look at the version history of the file you modify the most. Depending on how often you submit and how many other people work on that file, you can end up with a few hundred versions very easily, all of which only differ in mostly minute details. Review how easy or how time consuming it is to find a particular change in that file's history. That exercise will give you a good sense of how you should adjust your version limits and your submission policies.
Having a version control system does of course not absolve you from keeping a well-maintained chain of backups. There is always a chance a customer is stuck on a version of your software that is more than two or three years old. In software terms that is ancient history. Luckily some version control systems offer additional backup functionality to create archives or snapshots on external media.
Even if the server where you keep your source code repository is subject to a strict backup regime, always create your own as well. If you do not manage your server backup, you do not know what will be available two years from now.
7. When should I submit changes?
The golden rule for software developers is to never submit anything to a version control system that does not compile. One could argue that the DevOps equivalent for administrative scripts is to never submit anything that does not run successfully.
While it can be configured to submit changes every thirty minutes, specifying that short of an interval is not recommended in most cases.
We here at SAPIEN have frequently received questions on how you can submit a file to a version control system from PowerShell Studio or PrimalScript every time a file is saved. Since generally ALL modified files are saved when you run or debug a script, this would create a huge number of versions during a regular edit-test-debug cycle. The sheer number of almost identical versions would make it very difficult to find something specific later on, so that is just not a viable strategy.
So when should you submit? I recommend that you manually submit a version of any file that has passed your own testing; i.e. if it operates normally and everything you wanted to accomplish seems to have been done. Additionally, if you have the option as with VersionRecall, set an automatic submit every day at about 30 minutes before you usually leave or stop working.
8. The format dilemma
As I discussed in "How is my code stored?", your files can be stored in any number of ways, from a database to some proprietary binary format, as full versions or just as changes (also referred to as 'deltas'). Why is this important?
Imagine you still use the trusty old Microsoft SourceSafe that has served you so well over the years. Now your hard drive goes boom and you need to get your code back on another machine. Unless you are truly super organized, you will struggle to find an installable version of Microsoft SourceSafe. Microsoft no longer hosts any downloads. This means you need to find an old MSDN disc or go to eBay. It is not impossible, but you will sit and wait for a while. Which does not bode well for the project that was supposed to be finished on Tuesday.
If you are a little bit older you may remember the tape backups from the 90s. These worked really well until you needed to restore something. You made diligent grandfather-father-son backup tapes, just to find out in a restore that the new software version cannot read any of the tapes you produced with the older version
If you are not old enough to remember these, find a grizzled veteran admin and ask. Bring some sort of bribe, donuts, coffee or beer though.
The important point is, when you choose a system, practice getting your data back, don't just assume it will work. Close your laptop and pretend it is lost somewhere. Can you get your files quickly and effortlessly from wherever you store them?
This is of course also the time to point out that VersionRecall stores all your files in their native formats. So just go the folder on your NAS/Server where your repository is or plug in that external drive. Even if we no longer make the software, mess up compatibility with older versions or you need to get the files quicker than you can install VersionRecall (unlikely!), you can just copy the files as they are. Back in business in a few minutes.
But again, whatever system you choose, practice data retrieval so you know what to do when things go wrong.
9. Lone wolf versus team development
If you develop by yourself rather than in a team, you will not use a subset of functionality of version control systems, namely locking and branching. But even if you are by yourself, take these features into consideration when you choose a system because you never know what will happen a few months from now. You may have to work with other folks on that project and you do not want to lose time battling "who's changed what" questions.
One is the exclusive style, where files get checked out by one person and remain under that users control until checked in again. Version control systems designed for larger teams usually offer this as an option if not having it set as the default mode of operation. It limits write access to a file to one person, so no one can modify that file and add features while you rearchitect the code to get rid of a bug.
The other method allows everyone to work on a file and submit modifications. Concurrent changes are usually merged. If two developers start out with a version A of a file and modify two different locations in that file, you end up with a version that contains BOTH modifications.
If there is a conflict, for example both developers edit the same location, a new branch is created. You then usually need a manager or someone with authority to manage and oversee the entire project in version control to merge branches and keep everything on track.
Obviously, this is only a simplified view of the topic and you should study team features in any system you consider in great detail.
VersionRecall takes a more laid back approach to this. By default, it allows everyone to submit new versions. No automatic merging occurs, as it is our experience that this tends to lead to unforeseen side effects. It is quite easy for developers to select two versions of a file, view and merge the differences manually and thus create a combined version. As with any conflict in development, it requires the two developers working on the same file to communicate and resolve parallel work.
Of course VersionRecall has a locking mechanism which allows you to claim complete control over a file, signaling to other users that they should not interfere. Especially if you are working on files that are difficult or impossible to merge, e.g. binary files or images, it is a great way to keep others away from a file until you make your changes.
As with all team development, the larger your team, the more management you need. When you employ version control in a team you need to make sure someone with sufficient knowledge of the system you choose is in charge.
10. Developing on the go
Here is where it gets a little complicated. All these systems from the past decades make the assumption that you are somehow always connected to the place where you store your versions. Think desktop computer and network file server. Most modern systems assume that you store everything in the 'cloud' and that you have an internet connection at all times.
Sure, they all make provisions for the times you are not connected, for example when you are on one of those airplanes from yesteryear with no WiFi, but that leaves you with basically no version control functionality.
As a side note, Git and an assorted number of other systems do store versions you commit even when you are not connected to a server. But since that data is stored right next to your files in a subfolder, it does not fulfill the "on a different medium" criteria I discussed earlier in part 5.
VersionRecall allows an unlimited number of locations for a repository. It will use the first one that is in reach unless you select a specific one yourself. So, what does that exactly mean?
It means you can set up a repository on your NAS at home, your file server at work, your external USB drive that you take when you travel and, worst case scenario, the Compact Flash card in your notebooks CF slot. VersionRecall will automatically use the first one available wherever you are. Of course, the repositories in these locations will all contain different histories, meaning that you should always make sure you submit changes at your main location whenever you can.
But you will also always have something to fall back on, even when you travel in places without any connectivity. Just make sure you take that CF card out before you leave your laptop behind at the TSA checkpoint.
11. Graphical user interface versus command line
With the advent of PowerShell the command line has made a little bit of a comeback. After many decades of clicking buttons and dragging and dropping suddenly it is cool again to sit in front of a console and type cryptic command line statements with options a mile wide. All jokes aside, a version control system must have a command line tool, so you can automate its functions and, if no other API is provided, integrate it with development, build and deployment tools.
While command line tools are very good for automating common tasks, they are not so great for human - machine interaction. It is kind of the reason we developed graphical user interfaces in the first place. In the "olden days" on UNIX we did make due with DIFF output in a console to see what changed in a file, but really only because we had to. Today we have graphical diff and merge tools, explorer-style user interfaces that makes it easier to find the desired versions or changed parts of a version. We no longer have to repeatedly issue the same command because we scrolled past the relevant output in a command line.
Some systems are command line only. Graphical tools are provided by third party vendors, sometimes for a fee and sometimes free. Evaluate what is available and how often it is updated to keep up with the actual system you choose.
12. Common traps and pitfalls
While many developers have spent years using different version control systems and learned some of these lessons the hard way, System Admins and DevOps are generally new to version control. So here is a list, albeit incomplete, of some common problems you will face with such systems.
- The holiday neglect: People go on vacation. They get sick. Attend weddings. You need to have rules and procedures which cover when people need to submit their changes. I have seen SourceSafe repositories with files still checked out to employees who left three years ago, so make rules and enforce them.
- Trust the system fallacy: Always check on a regular basis if you can still access your repository. Can you still retrieve, view and compare older versions? Updates to the software, corrupted database files, file shares which no longer exist, changed permissions and so forth can all throw you a curveball when you need to retrieve something from an older version.
- Misplaced faith in the cloud: With almost continuous internet connectivity we think less and less about where our data is stored. If it is "in the cloud" we have grown to assume that someone else will protect, backup and, if necessary, restore our data. Pretend your internet connection goes out and stays out for days. Assume your cloud storage provider got hacked and data just disappeared. Can you still operate? Can you recover? Turn off your DSL or cable modem and see what still works.
- Lightning does not strike twice: Yes, it actually does. If your laptop gets fried by a power surge, so can the NAS you placed your repository on. It is possible to lose your source code and all your version history in one quick spark and a cloud of magic smoke escaping. Invest in surge protectors, backups and offsite storage even if you have a version control system.
- Don't forget the manager: Depending on the size of your team, you will need a part-time or full-time manager for your version control system. Don't forget to task someone with updating, and managing the software and its repositories.