Skip to content

2. TeXLive Docker Image

nguyenvukhang edited this page Oct 21, 2024 · 1 revision

As a key feature of this project is reproducibility, the entire book's PDF is built on GitHub Actions. For this, we need LaTeX. Now, none of GitHub Action's runner images offer LaTeX pre-installed, so we need a way to get LaTeX up and running on the CI container. Installing LaTeX every CI run not only takes significant time but also adds a layer of uncertainty.

The initial solution was to use xu-cheng/latex-action. It pulls a lightweight Docker image with LaTeX installed and uses that to build the PDF. While it's an awesome GitHub Action in its own right, the inconvenience with this implementation is that we can only tell it to run pdflatex on compile-ready .tex files. That is, we need to specifically export some main.tex that contains everything in the book, and then process that with pdflatex. This breaks the core compile flow, and requires us to implement a secondary one that will only run on CI. Having an extra compile flow creates a separation between the PDF that's built while writing locally and the PDF that is exported on CI runs. So while this solution works (and was the solution) for a long time, I knew that I had to continue searching for a better one.

The next solution I considered was to use the more general xu-cheng/texlive-action. Instead of specifying the *.tex files to compile, this now allows users to run arbitrary commands on the container. However, that means running the minimath binary natively on the container, which requires setting up Rust and then building the binary there. That's too much complexity to be running on an auxiliary container so I quickly abandoned this direction.

Fast-forward to today's solution: open a subprocess to pdflatex within a Docker container and pipe to its stdin. Honestly, I didn't know this was possible and I was pretty stoked when I first got it to work.

First, I forked xu-cheng/latex-docker and published an even more minimal version here. We can then create an executable file called pdflatex.sh with the following contents,

#!/bin/sh
docker run \
  --interactive \
  -v $PWD:/tmp \
  --workdir /tmp \
  --env TEXINPUTS \
  ghcr.io/libmath/texlive-small \
  pdflatex $@

and use pdflatex as if it were installed on the local machine. Just run ./pdflatex.sh with the arguments normally passed to pdflatex. The idea being that we link the current working directory, captured by $PWD, to the /tmp directory in the container using the -v flag; the --env flag sends environment variables defined in the host machine into the container; and finally we spawn the pdflatex command and pass on all the arguments with $@.

This is the current method of operation on GitHub Actions. Based on the 0-1s pull time of the GHCR image, it seems like GHCR-hosted images are cached. Cached or not, pdflatex is now made available on every CI run in effectively no time at all. The texlive-small image is rebuilt only once a month to fetch the latest copy of LaTeX, but otherwise every PDF compiled in that month uses the exact same version of LaTeX, it coming from the same image. So now we have a way to get LaTeX installed both quickly and predictably.

 

< Prev          Next >
Clone this wiki locally