Using pacman to Manage Emscripten Packages

C was created for use with Unix. Quite quickly it became one of the most used programming languages of all time. After some additional years it made its way into Linux kernel and operating system. To this year it is the primary language that is used to interface with the kernel or to write any sort of utilities. If not directly then through various bindings.

To allow use of external libraries C has a mechanism for including header files in your own source code. Then during linking stage compiled implementation of these headers is linked along with your code into final executable (or through dynamic linker with some additional steps). The configuration of what is visible for including and linking employs use of several PATH-like variables with some defaults, and sometimes (if you were having a good day and it had to be ruined) hidden or undocumented behaviour.

Management of the available packages that contain headers and libraries is usually offloaded to the system-wide package manager. Considering the relation between C and system it's hosted by, it isn't that bad of a choice. Now, it is not perfect, but with a well maintained upstream and local ecosystems it'll be just right.

Problems may appear when we change or take away one of the parts: operating system, C toolchain, or package manager. The most prominent examples of when it happens is: Windows, cross-compiling, and porting software between distros. Such cases, especially the first one, resulted in creation of external package managers e.g., vcpkg, Conan. In other cases they pushed people toward build generators such as CMake.

Recently, I've been playing around with Emscripten. I built some things here and there, and now, you guessed it, I'm trying out different approaches to handling libraries and decided to explore pacman(1) as means to it. I hope you enjoy this little experiment.

Without going into internals, pacman is a package manager used by Arch Linux, a distribution that describes itself as lightweight, flexible, and simple. It focuses on bleeding-edge packages. I picked it because I happen to use it on a daily basis.

Packages are distributed in a binary form and come from remote repositories. Package is an archive that contains files meant for installation and some meta information, all built by makepkg(8). Repository is really just a set of files managed by a repo-add(8).

Building Sample Package

I started by creating a sample package that provides raylib. To do that, I wrote a rather simple PKGBUILD file:

pkgname=raylib
pkgver=4.0.0
pkgrel=1
arch=(wasm32)
license=(zlib)
makedepends=(cmake emscripten)
source=("${pkgname}.tar.gz::https://github.com/raysan5/raylib/archive/refs/tags/${pkgver}.tar.gz")
sha256sums=("11f6087dc7bedf9efb3f69c0c872f637e421d914e5ecea99bbe7781f173dc38c")

Stop, right now! If you are a seasoned package maintainer or maybe you just cross-compiled enough software, you will notice that something is not right in here. Yeah, arch is wrong. It's a little bit counter-intuitive, so take a look at another example aarch64-linux-gnu-glibc, GNU C Library for ARM64 targets:

$ asp checkout aarch64-linux-gnu-glibc
$ cd aarch64-linux-gnu-glibc/trunk
$ grep arch= PKGBUILD
arch=(any)

This is different for a good reason: none of this is going to be used on the host system. Only the compiler and any binutils will be used, and they are actually targeted for the architecture of build host: x86_64 in this case.

Then why am I specifying wasm32 for my package?

Emscripten uses cache directories that contain a copy of sysroot. Host system may contain several caches and each will have own sysroot. I'm not entirely sure what is the reasoning behind it, but that's how it looks like at the moment of writing.

glibc package specifies any architecture, because it is intended to be installed in /usr/aarch64-linux-gnu and that's where compiler is expecting to see it. I could technically try to make my package operate in similar manner and install to /usr/lib/emscripten/system that acts as base for caches and is provided by emscripten package from Arch Linux repositories. I didn't do that because I wanted installed packages to be immediately available in my cache. To accomplish that, I decided to use pacman similarly to when you bootstrap a new system installation, and because the package is technically targeted at wasm32 I wrote that in PKGBUILD.

I think the normal way is also worth exploring. Assuming, that I first figure out how to deal with caches, why emscripten package does not install to usual /usr/wasm32-emscripten, and how to handle propagation of packages.

Anyway, I went the other way and I had to hack my way through. Let's continue with PKGBUILD:

build() {
  cd "${pkgname}-${pkgver}"
  emcmake cmake . -B build \
    -DPLATFORM=Web \
    -DBUILD_EXAMPLES=OFF \
    -DCMAKE_INSTALL_PREFIX=/usr
  cd build
  make
}

package() {
  cd "${pkgname}-${pkgver}/build"
  make DESTDIR="${pkgdir}" install
  cd ..
  install -Dm644 LICENSE "${pkgdir}/usr/share/licenses/${pkgname}/LICENSE"
}

I use CMake wrapper from Emscripten tools. The only part that's worth noting is that by default, CMake with Emscripten would set CMAKE_INSTALL_PREFIX to the path of currently used cache directory. That's not feasible for staging packages meant for distribution, so I use plain /usr instead. Thing is Emscripten uses include and lib directories located directly in the sysroot and not /usr, so I will need to adjust it somehow at later stage. Not now because Raylib uses GNUInstallDirs, which expands / prefix to /usr.

Package is ready to be build:

$ makepkg --printsrcinfo >.SRCINFO
$ CFLAGS='' CARCH=wasm32 makepkg
==> Making package: raylib 4.0.0-1
==> ...
==> Finished making: raylib 4.0.0-1
$ ls *.pkg.tar.zst
raylib-4.0.0-1-wasm32.pkg.tar.zst

First off, I unset CFLAGS to avoid default options from /etc/makepkg.conf causing problems. I also need to set CARCH to inform makepkg that I'm cross-compiling to wasm32.

Setting up Repository

Now that I had the package, I needed to "distribute" it. Repositories used by pacman are dead simple. They can be served over HTTP, FTP, or even local files. The structure for all methods is the same and relies on file system, paths, and central database file. The whole setup was:

$ mkdir -p repo_path/wasm32/core
$ cd repo_path/wasm32/core
$ mv package_path/raylib-4.0.0-1-wasm32.pkg.tar.zst .
$ repo-add core.db.tar.gz *.pkg.tar.zst

Yeah, that's it. First create a directory for the repository and move there. Path contains both: architecture and name of the repository. After that move the built package to the same directory, and finally add it to the database that has the same name as the repository. Now, it's a matter of making pacman use it.

Installing the Package

This section may contain wrong uses of tools for the sake of experimentation. If you are faint-hearted or feel the need of saying "this is not how to do it" or "this is not how you use it" without elaborating or suggesting another direction, then it's probably better for you to not continue or have a drink first.

Before doing anything I fixed the directory structure of cache to match one that pacman expects:

$ cd cache/sysroot
$ mkdir usr
$ mv include lib bin usr
$ ln -s usr/{include,lib,bin} .

Symlinks should make everyone happy for now.

Next step was to create directories used directly by pacman:

$ mkdir -p etc/pacman.d/{gnupg,hooks} var/{cache/pacman,lib/pacman,log}

And finally first thing that's worth attention - config file located at etc/pacman.conf. The plan was to use pacman in a bootstrap fashion for the sysroot located in cache directory, so I needed to write that in config terms:

[options]
RootDir = cache/sysroot/
CacheDir = cache/sysroot/var/cache/pacman
HookDir = cache/sysroot/etc/pacman.d/hooks
GPGDir = cache/sysroot/etc/pacman.d/gnupg
Architecture = wasm32
CheckSpace
SigLevel = TrustAll

Some directories were automatically re-rooted and some weren't. I simply experimented with -v option to see what is used and adjusted config until I ended up with this version. I don't need to mention TrustAll. Don't do it.

That's not all; repositories also reside in the config file:

[core]
Server = file:///repo_path/$arch/$repo

What's left is to sync database and install package. pacman assumes that it needs to be run as root user, but because I'm working with a user-owned cache as my root directory I'd prefer to not raise its privileges, especially considering that misconfiguration could break packages in host system. Let's try it out:

$ fakeroot pacman --config cache/sysroot/etc/pacman.conf -Sy
:: Synchronising package databases...
 core   418.0   B   408 KiB/s 00:00 [###################################] 100%
$ fakeroot pacman --config cache/sysroot/etc/pacman.conf -S raylib
resolving dependencies...
looking for conflicting packages...

Packages (1) raylib-4.0.0-1

Total Download Size:   2.04 MiB
Total Installed Size:  4.70 MiB

:: Proceed with installation? [Y/n]
:: ...
(1/1) installing raylib             [###################################] 100%

Looks like the installation process succeeded. Time to try it out.

Trying It Out and Adjusting pkg-config

Turns out it doesn't work just yet. Some samples would work but not this one.

raylib CMake module has a very peculiar way of defining its target. At first it asks pkg-config for hints and then uses them in a slightly inconsistent way. Long story short, CMake target will have linker options set based on output from pkg-config --libs --static disregarding any attempts to remove -L options.

Since I built my package with CMAKE_INSTALL_PREFIX set to /usr, the prefix variable in installed raylib.pc will be set to /usr. This will result in -L/usr/lib appearing in public linker options for raylib target, which will break the entire build process.

The problem here is the prefix=/usr in the module definition file. It should point to the actual root which is located in cache directory.

There are several ways to address it. My favourite was to simply rewrite the prefix as part of install hook that would be run by pacman. Sadly it failed because hooks are run in chroot. There are ways to fake it, but I didn't find them worth exploring at that moment. The other way was PKG_CONFIG_SYSROOT_DIR, and that's what I did. I tried to avoid it due to uncertain situation between pkgconf and pkg-config.

Luckily, it turned out good enough for me to wrap up the whole experiment. I patched Emscripten.cmake toolchain file and was able to build a sample project that used the installed sample package.

Should I show here something? Nah

Final Notes

This was a fun experiment. For some reason I really enjoyed that fakeroot use.

Package management or rather dependency management in cross-compilation context sounds like a good next direction to explore. I found various takes on it. GNU is a little bit more standardized and there are projects like crosstool-NG that at the very least ease configuration of toolchains. I couldn't find many examples of installable binary packages for target with the exception of the standard library. Instead, it seems that the usual approach is compiling ports by yourself (which is fine) from e.g., incredibly complex CMake trees (which is fine, but with flames in background). Otherwise, using vcpkg or similar manager. Or doing something wild.

As for anything else worth noting... I hope I pointed out everything that I wanted in the article itself. If not, well, it happens.