Git is software that keeps track of changes that you make to files and directories. And it’s especially good at keeping track of text changes. Let’s imagine that you have a document. You start with version 1 of that document. You make some changes to it, now you have version 2. And you make some more changes and you now have version 3. Git keeps track of those three different versions for you. It allows you to move back and forth between the versions. And to compare the different versions to see what changed.
Git is referred to as a version control system or VCS for short. Git is not the first version control system ever created, there have been others. And almost all of them had one primary purpose, to manage source code.
Programmers wanted a way to be able to track the changes that they made to computer code over time, as they added features and they fixed bugs. So they created version control. Because of this, they are also called source code management tools or SCM for short. The two terms are used pretty interchangeably.
The History Behind Git
There were Versions Control Systems that predate Git. We will talk about some of the most popular and the most influential of them, and that can help us to better understand Git.
The first of these is called SCCS, for Source Code Control System. It was released in 1972 and was developed by AT&T and it was bundled free with the Unix operating system.
In primitive version control, you might have a file, like a budget, and you would save version one of that file, version two, and version three, giving them different file names each time. Now when you do that, you’re saving the full document three different times. That’s not a very efficient way to do it. What SCCS does, is it keeps the original document but then instead of saving the whole document a second time, it just saves a snapshot of what the changes were. So if you want version five of a document, you just take version one and apply four sets of changes to it to get to version five. That’s a much more efficient way to store the changes over time.
So SCCS stayed dominant until the early 80s, when RCS was developed, Revision Control System. And it just made lots of improvements over SCCS. For one thing, it was cross-platform, whereas SCCS was Unix only. With the rise of the personal computer, it was important to have a version control system that would also work on PCs. It was also more intuitive, had a cleaner syntax with fewer commands, and more features. Most importantly, it was faster and a lot of the speed increase came from the fact that it used a smarter storage strategy than SCCS. Remember SCCs stored the original file and then kept track of all the changes to that file that went after it. RCS flipped that around, so it kept the most recent file in its whole form and if you wanted to go backward in time, you wanted previous versions, then you applied the change snapshots to go in reverse.
If you think about it, that’s a lot faster because most of the time what we want to work with is the current document. With SCCS if we wanted the current document and there were 20 sets of changes, you had to pull up the original and then wait while 20 sets of changes were applied. With RCS you can just bring up the current file and it’s already stored in its full state.
One of the problems with both SCCS and RCS was that they only allowed you to work with an individual file, one at a time. So you could track changes in a single file but not in sets of files or in a whole project.
CVS or Concurrent Versions System allowed you do to that. Now the real innovation in CVS is not just the fact that you can work with multiple files. It’s the concurrent part. The fact that we can have a place where we can store our code, called the code repository, and you can put that on a remote server and more than one user can work on the same file at the same time. They can work concurrently.
With previous versions, only one person could work with a file at a single time. So CVS adds a lot of features for users to be able to share their work and be able to update their file with changes that other people have made and placed in the remote repository.
The idea of working with remote repositories was further improved upon with Apache Subversion or SVN for short. SVN was faster than CVS and allowed the saving of non-text files, like images, where CVS couldn’t do that. Most importantly, the big innovation of SVN was that it was tracking, not just changes to single files or to groups of files, but actually watching what happened in a directory as a whole. Watching files and directories collectively and actually taking a snapshot of the directory, CVS would also update files one at a time as it went to either apply or read back changes. SVN would instead do that transactional commit and apply all of the changes that happened to the directory or to none of them at all. The snapshot was bigger than just the individual files, it was an entire directory or an entire set of changes that were happening to that directory at one time. It’s a subtle but important difference.
Now SVN has stayed the most popular version control system for a very long time. In fact, until Git came out. But there is one other Version Control System that comes in between and that was BitKeeper SCM. It was a closed source, proprietary source code management tool. One of the important features that BitKeeper had, and it was not the first to have it, but is distributed version control.
Before we get to that, let’s talk a little bit more about this idea of being closed source, where all the other ones that we’ve been looking at for a little while have been open source. The community version of BitKeeper was free and had a few fewer features and some usage restrictions. There was the paid version of BitKeeper but there was also a community version that they gave away for free. And that version was used for source code management of the Linux kernel from 2002 to 2005. It was controversial to use a proprietary SCM for the Linux kernel because the Linux kernel is an open-source project. No one owns it, where the SCM is owned and controlled by a company. So many people objected saying, well what if they change the rules in the future? We’re going to be stuck using this company’s software.
Well, guess what? In April 2005, the community version stopped being free and all those predictions came true. So BitKeeper was never as popular as CVS or SVN but it was important with the creation of Git. Because in April 2005, when the community version stopped being free, that is the same point at which Git was born.
Git was created by Linus Torvalds the person who created Linux. When BitKeeper stopped being free, they needed an alternative for managing their source code. Linus looked around and he didn’t like the other VSCs that were out there, like CVS and SVN. He did like some of the concepts of BitKeeper but he thought he could do even better. So he wrote a new version control system from scratch and that was Git.
Git is distributed version control, like BitKeeper. We will talk more about distributed version control. It’s also open-source and free, which is great for us because it means that people like you and me can download it for free, use it for free, and there’s no license fees or anything like that. It also means because it’s open-source, the community can see the source code and contribute to it. They can submit bug fixes, add new features, all those benefits we get because it is an open-source project.
It is also compatible with most platforms, like Linux, macOS, and Windows. And it is faster than most other source code management tools. A hundred times faster in some cases for some operations. It also has better safeguards built into it to guard against data corruption. Now, these improvements all worked. Git became a big hit. As people discovered the power of distributed version control, as they got used to all of Git’s nice features, Git experienced an explosion in popularity.
Happy Coding …