# Give AI a Clean Kitchen

> You can't make AI predictable. But you can make everything around it predictable. That changes more than you'd think.


**TL;DR:** You can't make AI deterministic. Stop trying. Instead, lock down everything around it: dependencies, environments, builds. When the boring parts are reproducible, AI's randomness becomes the only variable, and that's a completely different debugging experience.

---

I watched a chef burn a steak last week. Good chef, nice restaurant, just one of those nights. Same recipe he's made a thousand times, different result. That's cooking. And honestly? AI works the same way. Same prompt, different answer, every time.

Predictable? Not really. Not either of them. I stopped fighting that a long time ago.

What does work in a kitchen, though? Nobody succeeds by eliminating surprises. You keep the place clean. Ingredients fresh. Knives sharp. You control the stuff you can control, and the creative parts find their own way.

So what if we treated AI the same way? Stop trying to make it deterministic. Just give it a clean kitchen.

## Why Can't You Make AI Predictable?

I got into reproducible environments for selfish reasons, honestly. I was tired of "works on my machine." Locked dependencies, immutable builds, deterministic tooling. [Nix](https://nixos.org/) made most of this possible, once I survived the learning curve.

But lately something funny happened. maybe all that plumbing I laid down for myself is secretly what AI was missing too.

If you've used AI tools for any real project, you know. You build a model, deploy it, and the other machine gives you different numbers. Run the same prompt twice, two different answers. Your teammate tries your code, something else entirely.

And this isn't a bug you can fix. Temperature, sampling, model updates. It's non-deterministic at every layer.

The first thing everyone tries is fighting it. Pin the model version. Crank temperature to zero. Force consistency wherever possible.

I tried that too. It's like timing your cooking down to the microsecond. Sure, you get consistency. You also get food nobody remembers.

## What Happens When the Ground Shifts Under Your AI?

I was talking to a team not that long ago. They'd built an AI feature, demoed it internally, looked great. Staging, perfect. Production? The numbers made no sense. Completely different from anything they'd seen in testing.

They spent two days pulling their hair out before someone checked the BLAS libraries. Production was linked against MKL. Staging had OpenBLAS. The devs' laptops on macOS were using Apple's Accelerate. Three different linear algebra backends, each optimizing matrix math differently. Tiny floating-point differences that compounded over thousands of iterations into completely different AI outputs.

Same model. Same prompts. Same code. The ground underneath had shifted in a way nobody thought to check.

We've all dealt with "works on my machine." This is that problem, but worse. Traditional software bugs stay bugs. AI environmental differences produce behavior that looks intentional and is almost impossible to trace back to the cause.

Every story like this is really an environment problem wearing an AI mask.

I ran into a similar thing myself, less dramatic but just as annoying. I was using Claude Code to refactor a module in a Nix project. Worked great on my machine. Clean diffs, tests passing, everything tidy. A colleague tried the same prompt on the same codebase. Got different suggestions. Not wrong, just different enough that we couldn't review them side by side.

Took us an hour to figure out that his shell was sourcing a different Python version from somewhere in his path. Claude Code was reading his shell environment and tailoring suggestions based on what it found there. So my colleague and I were looking at completely different suggestions from the same prompt. Not a fair comparison at all.

Once we locked both machines to the same Nix flake, same prompt produced the same suggestions. Not identical token-for-token, that's not how language models work. But structurally the same. Same approach, same imports, same test patterns. Close enough to actually collaborate on.

## How Do You Debug AI When Everything Else Is Moving?

You can't make AI predictable. Fine. But you can make everything *around* it predictable, and that changes the game more than you'd think.

Lock your dev environment. Pin your dependencies. Make your builds reproducible. Now AI's randomness is the *only* variable left. That's a totally different debugging experience.

Your teammate clones your setup and gets a different result? Now you're looking at genuine AI variance, not environment drift. That distinction matters more than people realize. When you know the environment is identical, you can actually reason about why the AI did something different. Was it the prompt? The context window? A model update? Those are answerable questions. "Is it my BLAS library or the model?" is not.

There's a compounding effect with iteration speed too. When your test cycle is 10 minutes instead of 2 hours because you're not wrestling with environment issues, you try more things. AI work is mostly trial and error. Your gut feeling about which prompt will work is usually wrong. Speed of iteration ends up being the whole game.

Things I learned by getting them wrong first:

**Pin your runtime.** Not just the language version. The system libraries. The CUDA version if you're doing GPU work. The exact compiler. One team I worked with had three different GCC versions across their CI runners. The AI model was identical on all three. The results were not.

**Hash your dependencies.** A `requirements.txt` with `==` versions is better than nothing. A lock file with hashes is better. A Nix derivation that captures the entire dependency closure is best, but pick whatever you'll actually maintain.

**Version your prompts.** This one's newer. If you're calling AI APIs in production, your prompts are code. Treat them that way. Put them in version control. Diff them. Review changes. Change one word in a prompt and the output can look completely different. If nobody tracked what changed, you'll spend hours chasing ghosts.

**Make CI reproducible.** Fresh VM every build sounds clean, but if that VM pulls updated system packages weekly, your CI environment is drifting even when your code stays the same. I watched one team lose three days to a "flaky AI test" that was actually a silently upgraded numpy.

The tools are out there. Nix for bit-for-bit reproducible builds. Docker for isolation. Even a careful `requirements.txt` with pinned hashes beats winging it. Pick whatever works for you. The point is the mindset: everything around AI should be a known quantity.

## What Did Reproducible Builds Really Give Me?

For a long time I thought reproducible builds were just good hygiene. Discipline. The "right way" to do things, if you cared about craft.

Turns out they're really about making room for the stuff you can't control.

I noticed this most clearly when I started using AI coding tools daily. On days when my environment was clean, when `nix develop` dropped me into a shell where everything just worked, I spent my mental energy on the actual problem. Crafting better prompts. Evaluating suggestions critically. Thinking about architecture instead of fighting my toolchain.

On days when something was off, a broken symlink, a mismatched library, a stale cache, I spent that same energy on the wrong fight. The AI was fine. My kitchen was dirty.

When the boring parts are locked down, when the kitchen is clean and the knives are sharp, you stop fighting your tools. You start paying attention to the interesting parts. The creative parts. The parts where AI actually shines.

You don't need AI to be predictable if everything around it already is.

My chef friend still burns a steak now and then. His kitchen stays clean though. Ingredients fresh, knives sharp. Not because it prevents mistakes, but because it lets him focus on the cooking.

Give AI a clean kitchen. A stable home. Then get out of the way and let it surprise you.

---

*Twenty years of fighting dev environments, and somehow I ended up writing about kitchens. Giving talks on AI and developer tooling throughout 2026 if you want to argue about reproducibility in person. [LinkedIn](https://www.linkedin.com/in/garbas/).*