Let's Not Store Versions in Versioned Files

It is a rather common practice to include version numbers directly in the meta files or sources that are stored inside a code repository. To list few examples: npm makes it part of its regular practices, python's setuptools usually involves various hacks, be it normal imports, reading file, or anything else. Even in their more declarative approach their support limits itself to string literals and file or module attribute reading.

OK, we have a version number in a configuration file, or some other special file, or directly in the code. What's so wrong about it? The natural enemy of this blog - duplication. In this case, we're talking responsibilities.

There's a rather high chance that you are using Git to track changes. If not Git then Mercurial, SVN, Fossil, Darcs, or really any other version control system, distributed or not, it doesn't matter. What matters is that these tools are designed to help you control different versions of your software. When you add your software version into the source code of said software, you create a new independent layer of versioning.

Now, not everyone is a minimalist and a mere threat of an additional entity handling the same thing might not scare you. Same thing regarding the duplication of the version data. The problems begin when the version data is actually not duplicated between VCS and the source code. Native identification of commits in Git - SHA - doesn't really fit for distribution use, where Semantic Versioning makes much more sense for users. The point stands for other VCSes as well.

We end up with two distinct layers of versioning where one controls the other. This usually leads to very awkward workflows. In a commercial project I have worked in, we wanted to mark mainline branch as unstable and make it visible through a version number. For a released (and validated; the whole workflow was heavily oriented on paperwork) piece of software we wanted a regular version number. This resulted in an interesting process that was required for back-fixing: find merge base between release branch and mainline, make fix, merge to mainline, merge release branch into fix branch, increment version file, merge to release branch.

Other tendency that I see as a result of the duplication - the versions are out-of-sync or simply meaningless. Let's consider two approaches to incrementing version number in a file: before and after the release. First one makes it that the first commit of the release is a commit that increments the version number. Now, between this commit and the next release file is out-of-sync, because with each change the state only becomes more and more different from the version that is described by the file. The second approach is: create release - deploy application and whatnot - and then increment the version number to what is expected to be the next release. This requires strict management of what changes will get merged or good fortune-telling skills, otherwise that predicted number is meaningless as you won't be able to ensure that the release is a major/minor/patch.

Happily for us, most of the VCSes have built-in functionalities to help us control version numbers that are meaningful in a context of distribution and deployment. They are called tags or rarely baselines. With them we can mark arbitrary repository states with arbitrary strings that can be later referenced. Usually, they are used to mark commits that a certain release originated from. Sometimes it might be even the same commit that incremented version number in a file. We're not doing the last part. Instead we want to tag a commit, and read it from our build, distribution, deployment, or packaging system.

Some of them support it better, some worse. Luckily, good chunk of them allow for arbitrary logic to be executed, so we can implement it by ourselves. Additionally, there is a good chance that the VCS provides some kind of helper for getting the human-readable tag-driven version description. In case of Git there is git-describe(1), which fits this use case directly. We can just call it from CMake or setup.py and read its output. Very often we may be forced to generate some full pledged files, so be ready to do it.

Example CMake Project

Let's consider a C++ project built by CMake. It is intended to be packaged, but is also meant to support installing it directly from the repository. The program itself is rather simple - it prints out its version. Every single time:

#include <iostream>
#include "version.h"

int main(int, char*[])
{
	std::cout << version::full << std::endl;
}

The version.h is just a namespace with an extern constant. To provide an actual value for the version number, let's write a version.cpp.in that will be processed by CMake to generate the actual source that will be added to the target:

namespace version {
const char* full = "@VERSION@";
}

The @VERSION@ is a pattern that is recognized by CMake's configure_file which we will use to generate the actual source code that is intended for compilation just like we planned:

add_executable(version-print main.cpp)
git_describe(VERSION)
configure_file(version.cpp.in ${CMAKE_CURRENT_BINARY_DIR}/version.cpp @ONLY)
target_sources(version-print PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/version.cpp)

Now, CMake does not support this out-of-box sadly, so you need to implement git_describe (or similar) yourself or get an external module for it. There is one in Ryan A. Pavlik's repository and I have implemented one for Starshatter at some point. They are rather easy and fun to write (although there is a variety of edge cases), but it's still a bit shame that they are not part of the core CMake. Now, note that one of these "edge cases" might be triggering the reconfiguration of the output file as configure_file is configuration-time action and changes to the output of the git_describe may or may not be properly detected. Be wary.

Alternatively, you can consider using a define to, well, define the version number. I use this approach with separate file that defines the variable for the sake of dependency checks and recompilation.

Final Thoughts

Of course, a full-pledged support would be way nicer and stable, but surprisingly, we're not there yet.

Well, some of us are. To contrast the list of bad examples consider Go programming language, which recommends this method of versioning. Interestingly, it also uses code repositories as a form of package distribution, so any arguments saying that file with a version is needed because the repository is used as a means of distribution are baseless.

What to take away from this post? Next time when you will start a project, consider keeping meaningful version only in the VCS. If your building/distribution/whatever-else system does not support it fully out-of-box - try implementing it. Once you have a working implementation - push it upstream. Who knows, maybe in some time we might be able to have a consistent support in all across the ecosystem. As for now, back to experimenting, and until next time!