The power hack and complexity of Package Manager in Zig 0.11.0
Ed Yu (@edyu on Github and @edyu on Twitter) Oct.18.2023
Zig is a modern system programming language and although it claims to a be a better C, many people who initially didn't need system programming were attracted to it due to the simplicity of its syntax compared to alternatives such as C++ or Rust.
However, due to the power of the language, some of the syntaxes are not obvious for those first coming into the language. I was actually one such person.
Several months ago, when I first tried out the new Zig package manager, it was before Zig 0.11.0 was officially released. Not only was the language unstable, but also the package manager itself was subject to a lot of stability issues especially with TLS. I had to hack together a system that worked for my need, and I documented my journey in Zig Package Manager - WTF is Zon.
Since then I've had discussion of the Zig package manager with Andrew and various others through the Zig Discord, Ziggit, and even opened up a Github issue.
Now that Zig has released 0.11.0 in August 2023, and many of the stability problems were resolved so I want to revisit my hack to see whether I can do a better hack.
A special shoutout to my friend InKryption, who was tremendously helpful in my understanding of the package manager. I wouldn't be able to come up with this better hack without his help.
As I mentioned in my previous article, I changed my typical subtitle of power and complexity to hack and complexity because not only was Zig 0.11.0 (which first introduced the package manager) not released yet but also I had to do a pretty ugly hack to make it work.
I just want to reiterate my stance on Zig and the package manager. I'm not writing this to discourage you from using it but to set the right expectation and hopefully help you in case you encounter similar issues.
Zig along with its package manager is being constantly improved and I'm looking forward to the 0.12.0 release.
Today, I'll introduce a better hack than what I had to do in June, 2023 and ideally I can retire my hack after the 0.12.0 release.
I'll most likely write a follow-up article once Zig 0.12.0 is released (hopefully) by the end of the year.
I will not reiterate concepts introduced in Part 1, so please read that first if you find this article confusing.
One of my previous misunderstandings of the package manager was that I was using a Zig package as a library.
Let's reuse the same example of C -> B -> A from Part 1 in that our program C depended on package B, which in turn depended on package A.
The way I was building the program C and packages B and A was that I was basically copying over everything package A produced to package B and then copied over both what package B produced and package A produced to program C as part of the build process. The thing that was produced is called an artifact in Zig package manager.
That was not the correct way to use a package manager because one of the benefits of a package manager is that you only need to concern yourself with the packages you depended on directly without needing to care about the additional packages those direct packages depended on themselves.
In the example of C -> B -> A, program C should only know/care about package B and not needing to care at all that package B needed package A internally because the package manager should have taken care of the transitive dependencies.
In other words, package manager should have good enough encapsulation for packages so that the users need not care about packages not directly required by the main (their own) programs.
As an example, despite many of the dependency problems, npm does a good job (probably too good a job) of encapsulation.
It's so good that sometimes when you add 1 package, you might be surprised when npm automatically pulls down hundreds of packages because it would recursively download all depenencies.
However, such clean encapsulation is not always possible when we are building native programs in Zig especially when shared libraries are involved.
In addition to artifacts, the Zig package manager also has the concept of a module but it is mainly referring to Zig source code and is primary used so that your program can import the Zig package as a library.
A module is equivalent to a Zig library (source code) exposed by the package manager. A module is not useful when the binary library you depend on is not written in Zig.
When building your program, you need access to the artifact produced by the dependency in order to access the specific items produced by such dependency.
To summarize, if your package is written in Zig, then you can access the Zig code in such package as a module and you can access either the shared libarary, static library, or the executable produced by such package as artifacts. However, if your package is not written in Zig, then you need to do some additional work to expose the code/library as a module and expose the resulting items as part of the artifact.
The main problem I had to deal with was that the Zig package manager resolved around the idea of an artifact which requires a Compile step that is involved with either a compilation and/or linking step. As stated earlier, an artifact is the stuff that was produced as part of the build process. Where this falls apart is when we need to package together items that do not require a build (Compile) step.
Hence, the existing artifact conceptualization doesn't work well with when we have to deal with a package composed of an existing binary library such as a shared library that doesn't require any additional compilation or linking. Note that this can be the case even if you have the source code because you may not want to compile the source code yourself if the project releases binary packages as part of its releases.
I'll reintroduce the problem mentioned in Part 1.
The scenario is quite common in projects that uses packages written in a different language from the main project:
A: You often would need the shared or static library from the package written in another language compiled for your environment (such as Linux). B: You would also need to write a wrapper for such library in your native language. C: You then would write your program calling the functions provided by the wrapper B.
Our concrete example has 3 packages A, B, and C. Our program my-wtf-project is in package C, which needs to use DuckDb for its database needs.
The project C will use the Zig layer provided by package B, which in turn will need the actual DuckDb implementation provided by package A.
For our my-wtf-project
, our main program will call the Zig library provided by zig-duckdb. The zig-duckdb is just a Zig wrapper of libduckdb that provides the dynamic library of release 0.9.1 of DuckDb.
To use the C -> B -> A example in the earlier section, program C is our project my-wtf-project
, package B is zig-duckdb, and project A is libduckdb.
Note that package B used to be called duckdb.zig
but it has since been renamed to zig-duckdb.
There are two hacks I had to do for the build.zig
of package A(libduckdb),
package B(zig-duckdb), and program C(my-wtf-project):
-
In the
build.zig
of libduckdb, I had to create an artifact even if thelibduckdb.so
is a shared library that doesn't need additional compilation/linking by creating a new static library that is linked tolibduckdb.so
just so I can use the artifact in zig-duckdb. -
I had to use
Build.installHeader
to install both theduckdb.h
and thelibduckdb.so
in all thebuild.zig
to copy over these 2 files tozig-out/include
andzig-out/lib
respectively.
I'm still calling this a hack because as stated, a module is mainly used to refer to Zig source code that can be used as a library to be imported by your program. Just like how a shared library is not meant to be installed via calls to install header files, a module is meant to be used to refer to individual artifacts in a package. However, this is exactly what I had to do.
I believe this is better than how I was using Build.installHeader
and Build.installLibraryHeader
to install artifacts produced by dependencies.
A big benefit of using the module to refer to non-Zig-produced artifacts is that we do not need to copy over artifacts from the dependencies anymore.
A: libduckdb
The duckdb was written in c++ and the libduckdb-linux-amd64
release from duckdb only provided 3 files: duckdb.h
, duckdb.hpp
, and libduckdb.so
.
I unzipped the package and placed duckdb.h
under the include
directory and libduckdb.so
under the lib
directory.
build.zig.zon of A: libduckdb
Because libduckdb has no dependencies, the zon file is extremely simple.
It just lists the name and the version. I've intentionally been using the actual version number of the underlying DuckDb.
// build.zig.zon
// there are no dependencies
.{
// note that we don't have to call this libduckdb
.name = "duckdb",
.version = "0.9.1",
}
build.zig of A: libduckdb
This is the first big change from Part 1. We are not building anymore fake artifact. We are only introducing some modules so that any package depending on this package can reference these items using the various module names. This is still a hack because technically these items are artifacts not modules but at least we don't have to compile a shared library that doesn't need to be compiled.
pub fn build(b: *std.Build) !void {
_ = b.addModule("libduckdb.lib", .{ .source_file = .{ .path = b.pathFromRoot("lib") } });
_ = b.addModule("libduckdb.include", .{ .source_file = .{ .path = b.pathFromRoot("include") } });
_ = b.addModule("duckdb.h", .{ .source_file = .{ .path = b.pathFromRoot("include/duckdb.h") } });
_ = b.addModule("libduckdb.so", .{ .source_file = .{ .path = b.pathFromRoot("lib/libduckdb.so") } });
}
This will make more sense in the next sections.
B: zig-duckdb
The zig-duckdb is still a minimal Zig wrapper to DuckDb. It suits my needs for now and the only changes added since last time are the ability to query for boolean
and optional
values.
The big change is that we no longer need to install libduckdb.so
or duckdb.h
from libduckdb.
build.zig.zon of B: zig-duckdb
We do have a dependency now as we need to refer to a release of A: libduckdb.
// build.zig.zon
// Now we depend on a release of A: libduckdb
.{
.name = "duck",
.version = "0.0.5",
.dependencies = .{
// this is the name you want to use in the build.zig to reference this dependency
// note that we didn't have to call this libduckdb or even duckdb
.duckdb = .{
.url = "https://github.com/beachglasslabs/libduckdb/archive/refs/tags/v0.9.1.3.tar.gz",
.hash = "1220e182337ada061ebf86df2a73bda40e605561554f9dfebd6d1cd486a86c964e09",
},
},
}
build.zig of B: zig-duckdb
Note that we no longer install libduckdb.so
or duckdb.h
as part of the build process we previous had to do in Part 1.
We do have to call addModule
multiple times to expose not only the library libduck.a
(the artifact of this package) itself but also re-export the modules provided by libduckdb.
Note how we now call duck_dep.builder.pathFromRoot(duck_dep.module("libduckdb.include").source_file.path
to access the include
directory and duck_dep.builder.pathFromRoot(duck_dep.module("libduckdb.lib").source_file.path)
to access the lib
directory.
You can think of this as equivalent of reaching inside of libduckdb to access these items and therefore we don't have to copy these items into our output directory anymore as we previously had to do with lib.installLibraryHeaders(duck_dep.artifact("duckdb"))
.
pub fn build(b: *std.Build) !void {
const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});
const duck_dep = b.dependency("duckdb", .{});
// this is our main wrapper file
_ = b.addModule("duck", .{
.source_file = .{ .path = "src/main.zig" },
});
// (re-)add modules from libduckdb
_ = b.addModule("libduckdb.include", .{
.source_file = .{ .path = duck_dep.builder.pathFromRoot(
duck_dep.module("libduckdb.include").source_file.path,
) },
});
_ = b.addModule("libduckdb.lib", .{
.source_file = .{ .path = duck_dep.builder.pathFromRoot(
duck_dep.module("libduckdb.lib").source_file.path,
) },
});
_ = b.addModule("duckdb.h", .{
.source_file = .{ .path = duck_dep.builder.pathFromRoot(
duck_dep.module("duckdb.h").source_file.path,
) },
});
_ = b.addModule("libduckdb.so", .{
.source_file = .{ .path = duck_dep.builder.pathFromRoot(
duck_dep.module("libduckdb.so").source_file.path,
) },
});
const lib = b.addStaticLibrary(.{
.name = "duck",
// In this case the main source file is merely a path, however, in more
// complicated build scripts, this could be a generated file.
.root_source_file = .{ .path = "src/main.zig" },
.target = target,
.optimize = optimize,
});
lib.addLibraryPath(.{ .path = duck_dep.builder.pathFromRoot(
duck_dep.module("libduckdb.lib").source_file.path,
) });
lib.addIncludePath(.{ .path = duck_dep.builder.pathFromRoot(
duck_dep.module("libduckdb.include").source_file.path,
) });
lib.linkSystemLibraryName("duckdb");
b.installArtifact(lib);
}
Note that if you really want to install libduckdb.so
for example, you can do so with the following call:
_ = b.installLibFile(duck_dep.builder.pathFromRoot(
duck_dep.module("libduckdb.so").source_file.path,
), "libduckdb.so");
If you look into the project, you will see that I introduced a new file called test.zig
that was meant to test the new boolean
and optional
values.
In order to run the test, I've added a new test step in build.zig:
const unit_tests = b.addTest(.{
.root_source_file = .{ .path = "src/test.zig" },
.target = target,
.optimize = optimize,
});
unit_tests.step.dependOn(b.getInstallStep());
unit_tests.linkLibC();
// note how I use modules to access these directories
unit_tests.addLibraryPath(.{ .path = duck_dep.builder.pathFromRoot(
duck_dep.module("libduckdb.lib").source_file.path,
) });
unit_tests.addIncludePath(.{ .path = duck_dep.builder.pathFromRoot(
duck_dep.module("libduckdb.include").source_file.path,
) });
unit_tests.linkSystemLibraryName("duckdb");
const run_unit_tests = b.addRunArtifact(unit_tests);
run_unit_tests.setEnvironmentVariable("LD_LIBRARY_PATH", duck_dep.builder.pathFromRoot(
duck_dep.module("libduckdb.lib").source_file.path,
));
const test_step = b.step("test", "Run unit tests");
test_step.dependOn(&run_unit_tests.step);
Once again, you can see that's why I've exposed the lib
and include
directories of libduckdb via module.
I can now call addIncludePath
and addLibraryPath
by referencing their modules.
Note the call to setEnvironmentVariable
because -L
is only useful for linking not for running the test/program. Hence you need to point to libduckdb.so
using LD_LIBRARY_PATH
and once again by accessing the location of the shared library inside the libduckdb package.
Now to create the executable for our project, we need to link to the packages A libduckdb and B zig-duckdb.
Our only dependency is the release of B: zig-duckdb.
// build.zig.zon
// Now we depend on a release of B: zig-duckdb
.{
// this is the name of our own project
.name = "my-wtf-project",
// this is the version of our own project
.version = "0.0.2",
.dependencies = .{
// we depend on the duck package described in B
.duck = .{
.url = "https://github.com/beachglasslabs/zig-duckdb/archive/refs/tags/v0.0.5.tar.gz",
.hash = "1220fe38df4d196b7aeca68ee6de3f7b36f1424196466038000f7485113cf704f478",
},
},
}
This is somewhat similar to the build.zig
of B (zig-duckdb).
Note once again that we do not need to call installLibraryHeaders
to install the libduckdb.so
and duckdb.h
anymore.
I've also added setEnvironmentVariable
to set LD_LIBRARY_PATH
for running the test program.
pub fn build(b: *std.Build) !void {
const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});
const exe = b.addExecutable(.{
.name = "my-wtf-project",
.root_source_file = .{ .path = "testzon.zig" },
.target = target,
.optimize = optimize,
});
const duck = b.dependency("duck", .{
.target = target,
.optimize = optimize,
});
exe.addModule("duck", duck.module("duck"));
exe.linkLibrary(duck.artifact("duck"));
exe.addIncludePath(.{ .path = duck.builder.pathFromRoot(
duck.module("libduckdb.include").source_file.path,
) });
exe.addLibraryPath(.{ .path = duck.builder.pathFromRoot(
duck.module("libduckdb.lib").source_file.path,
) });
// You'll get segmentation fault if you don't link with libC
exe.linkLibC();
exe.linkSystemLibraryName("duckdb");
b.installArtifact(exe);
const run_cmd = b.addRunArtifact(exe);
run_cmd.step.dependOn(b.getInstallStep());
// you must set the LD_LIBRARY_PATH to find libduckdb.so
run_cmd.setEnvironmentVariable("LD_LIBRARY_PATH", duck.builder.pathFromRoot(
duck.module("libduckdb.lib").source_file.path,
));
const run_step = b.step("run", "Run the test");
run_step.dependOn(&run_cmd.step);
}
You can now just call zig build run
to run the test program because we already set LD_LIBRARY_PATH
using setEnvironmentVariable
in our build.zig
.
I ~/w/z/wtf-zig-zon-2 6m 10.7s ❱ zig build run
info: duckdb: opened in-memory db
info: duckdb: db connected
debug: duckdb: query sql select * from pragma_version();
Database version is v0.9.1
STOPPED!
Leaks detected: false
I ~/w/z/wtf-zig-zon-2 4.1s ❱
When I mentioned reaching inside the package, what happens behind the scene is that the package is in ~/.cache/zig
so all these magic with module is really specifying the path to the particular packages under ~/.cache/zig
.
You can see more clearly what's going on if you add --verbose
to your zig build
or zig build
commands.
I ~/w/z/wtf-zig-zon-2 4.1s ❱ zig build run --verbose
/snap/zig/8241/zig build-lib /home/ed/.cache/zig/p/1220fe38df4d196b7aeca68ee6de3f7b36f1424196466038000f7485113cf704f478/src/main.zig -lduckdb --cache-dir /home/ed/ws/zig/wtf-zig-zon-2/zig-cache --global-cache-dir /home/ed/.cache/zig --name duck -static -target native-native -mcpu znver3-mwaitx-pku+shstk-wbnoinvd -I /home/ed/.cache/zig/p/1220e182337ada061ebf86df2a73bda40e605561554f9dfebd6d1cd486a86c964e09/include -L /home/ed/.cache/zig/p/1220e182337ada061ebf86df2a73bda40e605561554f9dfebd6d1cd486a86c964e09/lib --listen=-
/snap/zig/8241/zig build-exe /home/ed/ws/zig/wtf-zig-zon-2/testzon.zig /home/ed/ws/zig/wtf-zig-zon-2/zig-cache/o/b893f00994b9c79eab2c150de991b233/libduck.a -lduckdb -lduckdb -lc --cache-dir /home/ed/ws/zig/wtf-zig-zon-2/zig-cache --global-cache-dir /home/ed/.cache/zig --name my-wtf-project --mod duck::/home/ed/.cache/zig/p/1220fe38df4d196b7aeca68ee6de3f7b36f1424196466038000f7485113cf704f478/src/main.zig --deps duck -I /home/ed/.cache/zig/p/1220e182337ada061ebf86df2a73bda40e605561554f9dfebd6d1cd486a86c964e09/include -L /home/ed/.cache/zig/p/1220e182337ada061ebf86df2a73bda40e605561554f9dfebd6d1cd486a86c964e09/lib --listen=-
LD_LIBRARY_PATH=/home/ed/.cache/zig/p/1220e182337ada061ebf86df2a73bda40e605561554f9dfebd6d1cd486a86c964e09/lib /home/ed/ws/zig/wtf-zig-zon-2/zig-out/bin/my-wtf-project
info: duckdb: opened in-memory db
info: duckdb: db connected
debug: duckdb: query sql select * from pragma_version();
Database version is v0.9.1
STOPPED!
Leaks detected: false
I ~/w/z/wtf-zig-zon-2 ❱
Part 1 is here.
You can find the code here.
Here are the code for zig-duckdb and libduckdb.
Special thanks to @InKryption for helping out on the new hack for the Zig package manager!