How to Generate Files From Templates in Shell

This is a total rewrite of an old article. I no longer liked it and my methods have changed.

Generating or "configuring" files from a template is a common occurrence. A prime example from what I usually use is configure_file in CMake. Another example would be service configuration files sitting in a staging location before getting deployed (e.g., version controlled configs for e-mail or web server). Today let's focus on these kinds of use and not verbose template engines for e.g., HTML.

Common notations to mark replacement spots in templates are: $VARIABLE, ${VARIABLE}, and @VARIABLE@. First two are obviously coming directly from shell-like notation of variable substitution. The latter exists exactly to be different from these for when we want to create a template that may contain $ notation in the output as part of its natural syntax. In other words: when the syntax of generated file uses $.

Using shell itself

POSIX-compliant shells support a mechanism called heredoc. We can use it in combination with cat(1):

#!/bin/sh
cat <<CONTENT
server {
	listen 80;
	server_name $USER.$DOMAIN;
	root /srv/http/$UESR/public;
}
CONTENT

This case has an obvious problem. This isn't really a template that I promised. Instead, it is a script that generates the intended output. Someone could also try to argue about useless use of cat here.

Using cat and heredocs gives us a lot of flexibility. We can wrap some content with a common header and footer if we want to:

#!/bin/sh
cat /dev/fd/3 $@ /dev/fd/3 3<<HEAD 4<<FOOT
<!doctype html>
<html lang="en">
HEAD
<script src="script.js"></script>
FOOT

Using envsubst

If we want a real template instead of an executable script we can use envsubst(1). This tool is extremely straight-forward in use: put template in standard input and get substituted text in standard output:

server {
	listen 80;
	server_name $USER.$DOMAIN;
	root /srv/http/$USER/public;
}

Then:

$ export DOMAIN=example.tld
$ envsubst <nginx.conf.in
server {
	listen 80;
	server_name aki.ignore.pl;
	root /srv/http/aki/public;
}
shell substitution

Envsubst supports ${VARIABLE}, too.

Major potential problem with envsubst is that it substitutes everything as it goes. It doesn't matter whether the variable exists in the environment or not. This is the usual expected behaviour from shell, but it might not be well suited for handling any output that uses $ in a meaningful way. We can partially workaround it using SHELL-FORMAT argument:

$ envsubst '$USER, $DOMAIN' <nginx.conf.in >nginx.conf

This limits the substitutions to selected variables. The format of this argument is not important. Whatever is a conformant variable reference will work: '$USER$DOMAIN', '$USER $DOMAIN', '$USER,$DOMAIN', and the first example are all equivalent. Just remember to not substitute the variables when calling envsubst by accident and to put it in one argument (hence why single-quotes are used).

Using sed

Finally, we can use sed(1) to gain even more control over what happens to our templates. This comes at the cost: sed does not have access to environment variables on its own. Usually, we can find it being used like this:

$ sed "s/@VERSION@/$VERSION/g" <version.h.in >version.h

Shell will substitute $VERSION there with the variable and any use of @VERSION@ in template file will be replaced. Note that sed can replace anything it wants - @ are used here to make it more strict and to make template more readable.

If we are feeling like over-engineering, we can generate script for sed and use that instead. The variable values may come from anywhere you want at that point, let's use shell:

#!/bin/sh
if tag=$(git describe --tags --exact); then
	echo s/@VERSION@/$tag/g
else
	echo s/@VERSION@/@BRANCH@-@HASH@/g
fi
echo s/@HASH@/$(git rev-parse --short HEAD)/g
echo s/@BRANCH@/$(git symbolic-ref --short HEAD || echo detached)/g

Here rather than using cat I used echo. Depending on the state of repository it is used in, it may output something similar to:

s/@VERSION@/@BRANCH@-@HASH@/g
s/@HASH@/4242424/g
s/@BRANCH@/nightly/g

We can then feed it into sed:

$ sed -f subst.sed <version.h.in >version.h

Now, if we want to over-engineer it for real, let's put it into a Makefile:

subst.sed: subst.sed.sh
	./$< >$@

%.h: %.h.in subst.sed
	sed -f subst.sed <$< >$@

Other Alternatives

Otherwise one could potentially use: perl(1), python(1), awk(1), maybe shell's eval if feeling adventurous (and malicious, I guess). CMakes configure_file is very nice but is limited to CMake. I'm starting to feel like it could be a nice weekend project to make a utility after a beer or two.