HN.zip

A tale of distros joining forces for a common goal: reproducible builds [video]

131 points by todsacerdoti - 57 comments
anotherhue [3 hidden]5 mins ago
Reproducibility changes everything, because nothing changes. We go from shamans chanting incantations over a blessed code base to a mathematical function with an algebra of system composition.

Let me give you the simplest example, when builds are reproducible you don't need package repositories, you need build caches.

All the problems with maintaining a repository (save bandwidth) evaporate.

okanat [3 hidden]5 mins ago
> Let me give you the simplest example, when builds are reproducible you don't need package repositories, you need build caches.

The type of reproducibility is different here. What you mention is possible via a stable compiler ABI already. However one needs to keep the source code the same. Without a stable compiler ABI, you may or may not fix it depending on what the compiler does.

The goal of reproducible builds is removing sources of environment-dependent behavior at the build level instead of the compiler level. So given all the same dependencies and same build commands your binaries should match wherever and whenever you compile them. The distros and software developers also made a huge effort to remove any kind of environment-dependent commands.

Different distros still have differences in the build commands they issue and the set of dependencies they enable. The space to cache each individual possible output would be enormous and impractical. So you would still need repositories.

anotherhue [3 hidden]5 mins ago
All true, but I can't not take this opportunity to shill NixOS which has meaningfully addressed many of these issues, and is indeed spending impractical amounts of money storing build outputs ($10k/m).

https://discourse.nixos.org/t/the-nixos-foundations-call-to-...

It is absolutely better to remove build entropy at the source code stage, but until all software is written that way there are a few build-environment tricks we can use along the way.

zelphirkalt [3 hidden]5 mins ago
I highly doubt, that we will get anywhere close to developers understanding and valuing reproducibility any time soon. Maybe in 20y or so. Basically outside of the corners like Nix, Guix and maybe a few random people in discussions about issues of package managers, I have not met anyone knowing how to and caring about reproducibility.

Meanwhile I enjoy setting up my own GNU Guile projects with a Makefile that sets up a reproducible Guix shell in which my project is run, so that I get the same result on various devices, only needing to issue a single easily memorized or discoverable Makefile target call. Most developers I met don't know how to set something like this up. Provided no one messes with guix package manager commits and guix infrastructure still exists, my projects will run in 10y just like they do today, with reproducible result. Neat.

Recently I have taken some time to have this for Ocaml as well. Took some asking on the mailing list, but now works. No need to have anything installed prior to running the Makefile target other than Make and Guix package manager.

patrakov [3 hidden]5 mins ago
> Basically outside of the corners like Nix, Guix and maybe a few random people in discussions about issues of package managers, I have not met anyone knowing how to and caring about reproducibility.

And I did. He was a developer of a purported secure messaging app. The question that changed his mind was: "How are your users, who surely audited your sources and found no trojans hiding there, going to know that you are not distributing trojaned official binary builds?"

anotherhue [3 hidden]5 mins ago
I assume you tried Flakes also? Does Guile have a better approach to this problem (other than the general language difference (everyone hate nix)).

Edit: Worth looking at https://github.com/numtide/devshell

zelphirkalt [3 hidden]5 mins ago
I have not tried Flakes. Guile's main way of installing packages, I believe, is using Guix package manager. And Guix is written using Guile.
transpute [3 hidden]5 mins ago
Does Haskell (used by NixOS) have a fully reproducible build chain?
jimmaswell [3 hidden]5 mins ago
> The space to cache each individual possible output would be enormous and impractical. So you would still need repositories.

I've been using Gentoo for the past few weeks and this was my thought. There are so many ways to compile packages depending on your needs. It is practical to make binary sets for a few common configurations that are good enough for most people for each distro, though.

(Installing and using Gentoo has been an incredible learning experience. You have to go into it with the desire to take long tangents filling the gaps in your knowledge re: shared libraries, kernel modules, your init system of choice, gcc, your windowing system of choice, bootloaders, bash, etc. I feel like I've made more of a quantum leap in my Linux skills the past month than I have in years, and it's been quite fun and rewarding.)

> Different distros still have differences in the build commands they issue and the set of dependencies they enable.

Would it even make sense in theory for this not to be the case? What is a Linux distro but a set of programs, libraries, and environment choices chosen to be run on the Linux kernel?

kbaker [3 hidden]5 mins ago
The Yocto Project (from the embedded space) has reproducibility as one of its goals. Since everything is built from source and from scratch, even the build toolchains, it is not too big of a step.

Seems like Yocto would make the base for a good general purpose desktop distro (if there is not one out there already.)

rcxdude [3 hidden]5 mins ago
Yocto was kinda doing the nix thing for a while before nix existed, but basically by slowly growing the capabilities in an ad-hoc fashion instead of working on it from first principles. It's resulted in a bit of a mess (it's an unholy mix of a custom functional-ish programming language grown out of a config file format, python, and bash, with a ludicrous capacity for action-at-distance) but there's still not really anything else like it.
transpute [3 hidden]5 mins ago
> unholy mix ... ludicrous capacity for action-at-distance

OE recipe-grade concision! This summary belongs in YP docs, wikipedia and LLM IDE warnings.

transpute [3 hidden]5 mins ago
Yocto still has dependencies on a bootstrap build distro.

Stagex + Yocto would be fully reproducible from a small seed.

> Yocto would make the base for a good general purpose desktop distro

YP has been working on a public binary reference cache/distro.

blueflow [3 hidden]5 mins ago
Do they evaporate? My impression was that they got shifted to the package recipes. Like Nix having to take the same considerations with channels.
BiteCode_dev [3 hidden]5 mins ago
You mean you don't need dep resolution, signatures and moderation? How so?
anotherhue [3 hidden]5 mins ago
Good questions:

1. If the deps are also themselves reproducible then you refer to a fixed point version of them and (at least for nix) the package manager works out the rest.

2. Signatures are a trust mechanism, if a cache feeds you bad data in response to a query then that's absolutely an issue, but since there can be multiple caches (or your own local spot-checks) it becomes easier to detect if a cache is returning bad data. A hyper-targetted attack would still get you unless you decide to manually build certain packages, but that's no different than existing repos.

Manually building might sound impractical but it doesn't actually take that long, probably less than a day for a desktop environment, which might acceptable in a high-trust environment if amortized. I should add that the process is fully automatic, it just takes longer than using a cache.

3. Moderation I don't have a good answer to, anyone can run an apt server, or publish a flake.nix file to a repo. Some would say it is censorship resistant.

LtWorf [3 hidden]5 mins ago
He hasn't understood what reproducible builds are.
XorNot [3 hidden]5 mins ago
This doesn't work at all: a build cache is meaningless in this context because the only thing you can do is rebuild the code to verify that what is in the cache is what comes from the code you expect...or to bootstrap the whole compiler chain, then use cryptographic signatures to chain trust from a matching compiler hash up to all the dependent outputs.

It certainly doesn't change the nature of packaging in any real way in that case.

anotherhue [3 hidden]5 mins ago
Others can rebuild the chain and the results can be compared. Without reproducibility there is no concept of comparison.
XorNot [3 hidden]5 mins ago
Which is still a cryptographic trust system - i.e. who are the others who built it and are they suitably independent or working from similar sources?
vlovich123 [3 hidden]5 mins ago
It turns out that conspiracies are really hard to maintain the larger the network gets due to network effects. All it takes is one guy to run it, get different results and go “guys I’m noticing something weird” to start the ball rolling. As a proof point, look at what happened to the attack on ssh by way of a xz utils back door.
gosub100 [3 hidden]5 mins ago
It's not just detecting malicious actions, binary reproducability helps solve "works on my machine" bugs because you'll get a different hash if you, say, link against a library you didn't think was being selected.
kpcyrd [3 hidden]5 mins ago
hello, I'm one of the speakers. I've been working on this since 2017, happy to answer any questions the hackernews crowd might have.
algo_trader [3 hidden]5 mins ago
looking at [1][2] are the slides available online? i am on chrome, getting just static html

[1] https://salsa.debian.org/reproducible-builds/reproducible-pr... [2] https://fosdem.org/2025/schedule/event/fosdem-2025-6479-a-ta...

kpcyrd [3 hidden]5 mins ago
foobazfoo [3 hidden]5 mins ago
Fantastic project. Thank you for all your efforts.

Regarding the reproducible bootstrapping problem, what is your project's policy on building from binary sources? For instance, Zig is written in zig and bootstraps from a binary wasm file which is translated to C: https://github.com/ziglang/zig/tree/master/stage1

Golang has an even more complicated bootstrapping procedure requiring to build each successive version of the compiler to get to the most recent version.

pabs3 [3 hidden]5 mins ago
See the Bootstrappable Builds community. They do not allow bootstrap that uses pre-generated files (binary or otherwise), except for an MBR worth of commented machine code in hex.

https://bootstrappable.org/ https://lwn.net/Articles/983340/

kpcyrd [3 hidden]5 mins ago
Thanks! The kind of work I do is about making an existing operating system issue reproducible packages, to the point that you can install a system with reproducible-only packages. This assumes "trusted source code and compiler", but no more tampering by the build server, which is already quite the improvement from what we have right now.

To solve the need for trusted compilers (aka bootstrap from binary seeds) you're probably interested in https://bootstrappable.org/ and https://codeberg.org/stagex/stagex.

To solve the need for trusted source code there isn't really any solution besides "have people publicly document the source code they have read", like https://github.com/crev-dev/cargo-crev does. Often people ask "how do I know whose reviews to trust", but in reality there's a scarcity of reviews even if you're willing to trust literally anybody. There aren't really any incentives for people to make them, capitalism is failing us on that front and big companies don't want to publicly talk about the source code they have and haven't read either.

jmclnx [3 hidden]5 mins ago
It is very cool to see distros working together for a common goal.

But I still do not understand the point of "reproducible builds". I know what they are, but to me the amount of work involved outweighs the benefit.

I even heard NetBSD is also working on "reproducible builds". So maybe I am missing something :)

noirscape [3 hidden]5 mins ago
Practically speaking, the idea with a reproducible build is that you can take the source files they used, run their instructions the same way they did and get hash for hash[0] the specific resulting executable.

The main benefit is that you can trust that the resulting binary file being served matches the source code that it's build from. This mostly matters for distros in that they build from a source package repository, but anyone running a mirror could hypothetically replace the package with another (potentially malicious) package, leading users to install malicious tooling. It mainly matters for distros because pretty much every distro out there runs on third party mirrors (often ran by universities, but also just people who want to help) rather than on direct upstream; packages get uploaded to a main server, then mirrors copy from that main server (to reduce network traffic load on the main server). Right now, mirror trust is mostly "we assume you're not gonna be evil, until we get complaints". If the build is reproducible, the software can inherently confirm that the file they're getting is trustworthy, making "getting complaints" much easier to confirm.

It can also speed up the overall building process; if the package source code hasn't changed, you can also always assume that the resulting binary hasn't changed (meaning you can use hashes instead of relying on mtime like make does). Docker build cache works in a somewhat similar way (although docker isn't inherently deterministic).

Devwise, you can also reconstruct a build much easier if it's reproducible; ie. if you've accidentally thrown away the .elf file for debugging, if your build is deterministic, you can just rerun the build and get the same .elf file again.

[0]: While not a problem for Linux distros, in cases where you need a secret to sign an application, reproducible typically means "identical except for the signature" instead. F-Droid uses this for example to figure out if they should use buildserver stuff or the original APKs: https://f-droid.org/docs/Reproducible_Builds/

yellow_lead [3 hidden]5 mins ago
> a mirror could hypothetically replace the package with another (potentially malicious) package, leading users to install malicious tooling.

It was my assumption that a mirror is required to host a build that has a hash conforming to the original. Is that not the case?

jerf [3 hidden]5 mins ago
Yes, the real attack isn't that mirrors change the files, the real attack is that just because a distro packages Binary X and Source X, it is difficult without reproducible builds to prove that Source X actually did produce Binary X. It could have been compiled with a trojan in it between the source and binary.
gruez [3 hidden]5 mins ago
More specifically the packages are signed by the distro and automatically checked, so a mirror can't go rogue even if it wanted to.
SkiFire13 [3 hidden]5 mins ago
> meaning you can use hashes instead of relying on mtime like make does

Note that mtime still has the advantage of being faster than hashing.

jcranmer [3 hidden]5 mins ago
The main benefit you'll hear touted is something along the lines of being able to get an attestation that the resulting artifact was built following the steps claimed to build it. I think that's a somewhat overstated benefit, though, as it's not clear to me that this is an avenue of attack used in practice, given the frequency with which software already has vulnerabilities usable for exploits, or the ease with which one can insert a backdoor into the source code (e.g., the xz backdoor).

I think the actual main utility is that the process has done a very good job of rooting out several causes of unintentional nondeterminism in the build process. I say unintentional because the two main causes of unreproducibility, by several orders of magnitude, are timestamps being embedded everywhere and absolute paths being embedded everywhere, and those are rather expected. But some of the unreproducibility comes from things like accidental reliance on inodes in file paths (i.e., doing "for file in listdir()" without sorting the results of listdir) or the compiler itself accidentally sorting based on pointer address (which is unreproducible on ASLR systems).

uecker [3 hidden]5 mins ago
The xz backdoor was news because they went to a lot of effort to try to hide a backdoor in the source (actually a binary file in the source) and still failed. In contrast, without reproducible builds it is trivial for a maintainer with upload rights (or somebody who managed to get the credentials from a maintainer) to insert a backdoor into a binary. And it is then virtually impossible to detect.
rcxdude [3 hidden]5 mins ago
Well, the xz backdoor was detected through the behaviour of the resulting binaries, not through observation of the source code tampering, so I don't think it's a great example.
uecker [3 hidden]5 mins ago
Fair point. It was pure luck that it was discovered. What I was referring to mainly was how much effort it took to get the backdoor installed in the first place, compared to simple uploading a compromised binary somewhere.
drowsspa [3 hidden]5 mins ago
Yeah, I would say the x backdoor succeeded summa cum laude
david-gpu [3 hidden]5 mins ago
It's a safety measure. Reproducible builds ensure identical binaries are produced from the same source. They help detect e.g. hidden backdoors.
3s [3 hidden]5 mins ago
A really important application of reproducible builds is running code inside Secure Enclaves that has been committed to on a public transparency log. A client can connect to a remote secure enclave that can then prove to the client that it’s running the commit code via a process known as remote attestation. It’s pretty cool stuff. However it’s only possible if the build inside the enclave is reproducible (deterministic) and always identical to the build on the transparency log
champtar [3 hidden]5 mins ago
With reproducible build you know that what you test on your dev laptop is the same as what will go out from your CI, and if hash mismatch you can chase why. For a concrete exemple, Mellanox driver configure script will auto detect if it's running under docker and change a compile flags, so if you build in a container using podman you get a different result.
solarkraft [3 hidden]5 mins ago
> but to me the amount of work involved outweighs the benefit

I don’t know whether I’d spend this much work on such an abstract goal, but what reproducibility changes really is quite amazing. It vastly increases trust in published binaries and obviates the need for signing and the security benefit of compiling software yourself.

gruez [3 hidden]5 mins ago
>obviates the need for signing and the security benefit of compiling software yourself

Not really. Most people still would rely on signatures because they can't be expected to compile everything from scratch just to verify their download is authentic. Moreover even though reproducible builds make verification easier, it still requires someone to sound the alarm. For less popular packages there might be nobody checking any particular build is backdoored, because most people see "reproducible builds" and they assume Somebody Else is doing the reproduction.

mjl- [3 hidden]5 mins ago
for transparency of reproducible builds of go applications, i made https://beta.gobuilds.org/. it compiles any publicly available go application on-demand, with a toolchain version of your choice (latest stable by default), for a platform of your choice. all (pure) go applications are reproducible by default, including when cross-compiled, and go toolchains run nothing provided by the go module (awesome properties!). the source code is verified through the go sum database (a transparency log containing go modules). the hash of the resulting binary is added to gobuild's own transparency log. so it can be publicly verified. the gobuilds service builds the binary itself, and has another instance (on a different platform & config) build the binary too, to ensure the binary is really reproducible (i'ld like other instances that i don't run myself as secondaries too). i no longer publish binaries for my applications (that i write in go). i just point to the "latest"-build link for the go module at gobuilds. also makes it easy for users (including myself) to get new builds for new go toolchains (which may include fixes to the (relatively large, and often used) standard library).

you still may not trust the public gobuilds instance. my hope is that people (eg software projects themselves, or distros, or other kinds of communities) will run & use their own gobuild instances and verify their builds against the public gobuilds service. win-win: gives them assurance their builds are really reproducible, and builds trust in the public gobuilds (keeping it honest, if someone sees a hash mismatch, they will speak up).

i usually don't get much enthusiasm for it though. (:

ssivark [3 hidden]5 mins ago
What makes you so confident that the benefit is less than the effort?

Given the increasing likelihood of supply chain attacks, isn’t this a very prudent precaution?

samsartor [3 hidden]5 mins ago
The video gets into that. The main purpose is to verify that the binary you're running came from the actual source code.
NegativeLatency [3 hidden]5 mins ago
Would make stuff like this harder to pull off: https://en.wikipedia.org/wiki/XZ_Utils_backdoor
nindalf [3 hidden]5 mins ago
Are you sure?

If I'm understanding correctly, the malicious code was introduced as part of the test code, so no matter who compiled it, they'd get a binary with the same (malicious) functionality. Heck, it might even have been reproducibly malicious.

The real crazy part was that it was modifying the functionality of sshd at runtime, allowing the attacker to log into any system.

Reproducibility of either sshd or xz wouldn't have stopped this attack.

That's my reading of https://research.swtch.com/xz-script.

solarkraft [3 hidden]5 mins ago
I tend to agree. The exploit was (by detour) committed to the source.
ramses0 [3 hidden]5 mins ago
Technically, interestingly, if `bazel` were used as the build tool, it would avoid the straightforward ability to cross-contaminate the build executable with the test code...

Yeah, `bazel run test:...` would have access to the test files, but `bazel build xz:executable` would not (by default) be able to pull in extra shenanigans from the test files (and I think there's generally linting and formatting rules required by default with `BUILD.bazel` files, reducing another sneak-vectors)

kpcyrd [3 hidden]5 mins ago
It unfortunately doesn't help in cases like this. Reproducible Builds gives you a trusted path from source to binary, but it doesn't help with backdoors in the source code/build instructions.

For that we'd need some sort of source code reviewing effort like https://github.com/crev-dev/cargo-crev implements. I've started whatsrc.org to keep track of the source code inputs we're putting into our computers (that would benefit from reviews), but the conclusion is also somewhat "it's too much".

zelphirkalt [3 hidden]5 mins ago
It is also about reliably being able to built things a month from now, in a year, in 5 years, etc.
pabs3 [3 hidden]5 mins ago
I wonder how common reproducible builds are outside of the distro bubble. I guess PyPI isn't looking at it yet for example.
pabs3 [3 hidden]5 mins ago
I'm looking forward to more distros adopting Bootstrappable Builds, so far I think only Guix has.

https://bootstrappable.org/ https://lwn.net/Articles/983340/

pelasaco [3 hidden]5 mins ago
I think for a distro like Talos Linux, with only 12 binaries, will be much easier to accomplish it
lowkey [3 hidden]5 mins ago
Original link didn't work for me. Here is the source https://fosdem.org/2025/schedule/event/fosdem-2025-6479-a-ta...