Last week a Reproducible Builds Summit was held in Berlin and since it was almost at my doorstep I had to take part. Along with Eelco Dolstra, I represented the NixOS voice in the debates that were happening at the summit.
When is a build reproducible?
There were quite few discussion sessions trying to define, when some piece of software is reproducible. While it is important to have a definition - in words, I was hoping that there would be a simple tool that could tell me this.
Such tool might never exist. What I realized during the summit is that reproducibility is not something which is true or false, but something that we is true until somebody disproves it. Reproducibility is a goal which we are always working towards, just like security.
As with zero vulnerability days, there should probably be zero reproducibility days.
Making sure that our software is reproducibility would follow similar practices we have already for security: 3rd party auditing, CVE-like database of reproducibility bugs, ...
Knowing if something is reproducible is not a simple yes/no question, but it is a process you need to follow. Yaay I am not too old to learn something :)
Different kinds of reproducibility
For the purpose of this blog post I would like to point out that there are - at least - two kinds of reproducibility.
When we talked about reproducibility at the summit we were of course referring to a bit-by-bit reproducibility or as I will continue to call it during this blog post binary reproducibility.
And then there is reproducibility which I like to call build reproducibility which only ensures the reproducibility of build environments (eg. versions of tools in build environments are the same).
Purpose of Reproducible Builds effort is of course to be binary reproducible, but to be able to build something bit by bit identical you need to use the same version of tools, which makes build reproducibility a pre-step of binary reproducibility.
Build reproducibility is a prerequirementof binary reproducibility.
Why this distinction matter I will explain in a bit, but for now just acknowledge this naming.
We are all biased
A leader in Reproducible Builds efforts is Debian community. You can see that Debian community is working hard on this and many Debian developers were present at the summit.
Getting the involvement outside of the Debian community is high on the list, since everybody realizes that only with common efforts we will be able to achive reproducibility nirvana.
But regardless of all the good intentions, I noticed two biases that I would like to point.
- Many of us look down on language specific package managers (eg. pip, cabal, ...) as being less worthy usually talking about them as: "Who in their sane mind would use latest version of packages?". It sounds like a usual developer vs. sysadmin conversation. I hope at next summit we could also have representatives from some of the language specific package managers joined the discussions.
- I got the impression that the sole reason of reproducible builds is that you would be more secure. That implies that everybody cares about security. Which would be great, but in a world with tight deadlines and startups security is usually the first thing that gets crossed out of the list. We need to make a more compelling reason then just security. I am aware that security is important to many, but we must also understand that it is not top priority for everybody.
Probably I missed some, because of my own biases, but the only way to overcome such thing is to try to expose each other biases.
Why would you care about reproducible builds?
My personal quest for this summit was to find better ways to market reproducible builds. I found that many who tried or want to introduce reproducible builds at work, fail because of few of reasons:
- you need to opt in and switch from tools that you are used to.
- many tools you are already using were not built with reproducible issues in mind.
- reproducibility many times sounds like: all or nothing.
- the prevalent (marketed) benefit of reproducible builds is better security. not everybody requires that level of security.
- high cost (usually in developers hours) is usually required.
Reproducibility as productivity tool
What if we turn the marketing of reproducible builds around. What if the main (marketed) reason for the reproducible builds would be to improve developer productivity?
In previous paragraph I already explained that reproducibility is not a simple yes/no questions, but it is a process which one must follow. Also the path to reproducible builds is not a simple turn on/off switch. There are many steps on the way to binary reproducibility that already improve current development practices while getting us closer to binary reproducibility.
One example of this could be: reproducible development environments. If we have tools that could recreate any environment, couldn't we also use the for development purposes?
And we don't have to stop here. Isn't the most expensive part of fixing bugs, reproducing them? Couldn't build reproducibility also help us with reproducing certain type of bugs?
Maybe by now it became I little clearer, that there are many benefits along the way to binary reproducibility, that might be more convincing for some companies.
Cross platform build tool
Another discussion I watched from across the room was, BuildInfo specification. As part of reproducibility building Debian packages, also an BuildInfo file is (will be) produced which has all needed instructions, sources and final checksums, that allows somebody else to verify (reproduce) the resulting binary.
I was not alone thinking that this verification process should/could be distribution agnostic. Even a group was formed to discuss this, but sadly I was busy in other discussions to take part in this.
But then I realized that BuildInfo effort is actually changing a binary distribution like Debian into a source -> binary like distribution. Why produce BuildInfo file after the build process and why not start with it and only record the checksums of binaries after the build is done.
Is there such build tool that works across distributions would allow us to have BuildInfo specification (except the checksums) before the build process? Of course there is: Nix.
What many do not know about Nix is that Nix is first and foremost a build tool. It only happens that there is a database of packages already described how to be built and a side-effect we get is that Nix can also be a package manager. But initially it is a build tool. Nix can build .deb or .rpm packages or any other format you want.
What I would like to see is that whoever is looking into this direction that gives Nix a try and at least learn from it, because Nix and NixOS community is doing build reproducibility already for the last 10 years.
Few things I want you to take from this blog post are:
- Go to the next Reproducible Builds summit. It was great. I hope to be there too.
- Reproducibility is a process and not a state.
- There are many useful steps before you reach reproducible nirvana. It might make sense to market those as well.
- Nix is a cross distribution build tool. Use it. I know I will :)