Of LLMs and Cheese

Or, how to work with non deterministic systems.

Oct 08, 2025

I mentioned before how working with non-deterministic systems (e.g. the output from LLMs) take a bit of getting used to, and often requires finding the right use-case which it’s preferable or tolerable. I also mentioned that I’ve launched products like this before. This post will cover some of those use-cases, with principles and concrete examples.

Note that out of the three current main use-cases for using LLMs — personal use, enterprise automation, and embedding in products — this post centres on the latter. This is about how to use the fuzzy and unpredictable output, and compensate for the times it occasionally goes wrong in the face of end-users.

What does cheese have to do with it? Read on…

My blog post images are sourced on Pixabay from human creators.

When to Loop Humans

Human-centricity is important in any application, and knowing when an how to involve humans in the process is critical. I’ll give you two conflicting examples, to show the range and design thinking needed.

First product I’ve launched was around data leak prevention, attempting to stop people from accidentally disclosing sensitive information. The first version was heuristic and tended to warn users too often, so people got used to clicking OK. It was better to do some natural language processing, which wasn’t perfect but when it did pop up users paid more attention. That’s an example where imperfect prediction is still preferable overall.

Another product had to do with data retention: identifying sensitive info that should be disposed (y’know, before the hackers steal it). Problem was that the founder was a big believer in using AI for decision support without letting it make final calls. That left the information reviewers with lists of tens of thousands of entries… Which they had to manually review. It’s hard enough to explain to archivists that data past its retention schedule is a liability and needs to be disposed; giving them a tool that didn’t help was useless.

It should be painfully obvious that the key is knowing your users and understanding the context they operate with. Pretty much Product Management 101, regardless of the fact that we are dealing with non-deterministic, fuzzy systems.

So how can you build a product around this? What forces shape your decisions, and what options might you deploy?

Consider the impact on human lives when it goes wrong
- Always, always start with that. Some times a human must review and approve everything, other times it’s better to handle by exception.
Consider the scale of interactions, and design a UI to support that
- There’s a bias where people accept the ‘verdict’ of any algorithm as unbiased and correct, despite it not being the case. They also get used to clicking ‘OK,’ especially when it’s not their main task.
Think about second-order effects
- Unintended consequences of your design choices. It’s very hard to imagine, but tagging some QA and Security experts and doing observations on user interactions can help.
Think about the tools
- “AI” is a broad umbrella term, more marketing than any specific technology. As we know, a lot of it was built on plagiarism and slave labour, and it encodes so horrible Western-white-male biases. This about how can you use the tool in ways that do not cause harm to vulnerable people and the environment.
Design escalation paths to a human
- When does the human need to be involved? How do affected parties (including the direct user!) object to the output? Who monitors and handles those objections? Governance isn’t a policy document in SharePoint, it’s an ingrained practice.

Security by Design

I wrote before in general about securing generative AI, which — spoiler — comes down to putting your black hat on, thinking of all the possible second-order effects, and restricting inputs, outputs, and undesirable behaviour. A lot of the problems arise from this being new (so new attack vectors), the companies behind it prioritising addiction to win market share, and humans being humans (both gullible and misanthropic). That original article stressed exploring all the environmental, social, and governance factors you need to understand and build towards, and applying sound user-centric, human-centric really, product management principles.

In contrast, I now want to highlight how to practically embed an LLM in your product, using the principle of Refrain, Restrict, and Trap. This was first put forward like that in a research article about protecting from prompt injection. This comes down to choosing the use-case carefully, operating on least-privilege principles, and carefully controlling the inputs and outputs to models.

Keen observers would note that this is pretty much opposite what the industry is trying to push you to do. From OpenAI’s agents, to Perplexity’s agentic browser, to how MCP has been built by children. You can either let an unpredictable device run amok with your money, or you can have security — but not both.

Anyway, back to RRT. While the above article provided guidelines, here’s an example implementation from a PMs perspective. We built an answer engine as part of our search capability in the DXP (Digital Experience Platform; think websites on steroids). Our customers are governments, higher education, and highly-regulated industries — but they do like to keep up with the times. Offering a conversational search function, as natural language responses to searches, can help their audience, so long as it’s accurate and safe. It can be really helpful to use a technology that can match existing content to questions and provide direct answers (so users don’t need to click through a list of blue links).

Here’s one basic idea of how we ensure this safety and accuracy:

Refrain
The most basic thing is to understand when the technology is suitable and when it isn’t. A law firm can use it to help find partners who specialise in a particular area, but will never use it for legal advice. A university can use it to help prospective students navigate campus life, but not financial aid.
Restrict
The solution (which is much more than simple RAG), only has access to information about the selected topics. It doesn’t have access to backend systems. Input is limited (most jailbreaking techniques require a fairly long inputs) and filtered for allowed / disallowed topics.
Trap
The user’s query (input) is converted to a question, to help to trap anything untoward. The LLMs output is then checked for groundedness in and faithfulness to the source material, as well as compliance with other instructions. Only then is it passed back to the user.

This is just one example of a case where fuzzy output (being able to tailor the info to the specific question, especially in under-served non-critical areas) is useful, but how you still need to control for undesirable behaviour. We put a lot of thought into this, building the checks (Traps) into both input and output to ensure that the responses are accurate and do not cause harm.

In technical terms (see the marketing terms below 😉), these are user input sanitisation and an AI eval system which includes both deterministic checks and judge LLMs. Plenty of articles about it (eg this recent one from Lenny). I’d also suggest you visit OSWAP’s page on LLM risks, which covers many common pitfalls and mitigation techniques. I’ll do you, and your engineers, a world of good to keep them in mind when designing AI-backed solutions.

… and Cheese

I promised at the top that this has to do with cheese. So here’s a mental model you won’t forget in a hurry: think of LLMs as ripe Gorgonzola.

It’s fuzzy

Just like a good cheese has some fuzzy mould around (and inside) it, so do LLMs (and neural nets in general) have an in-built fuzziness of output. All these probabilistic models means the output isn’t sharply defined, but has a nice fuzz about it.

Some people love furry cheeses. Some people don’t. Some use-cases can thrive with this. Others can’t stand it.

It can be smelly

If you don’t pay attention to the fuzz, you can end up with something nasty and poisonous. Also, some people just don’t like it — don’t shove it down their throat. Keep tight controls and governance, and pull your users in rather than pushing it on them. Otherwise things will have the shape of that accompanying pear.

It’s a Marketer’s Dream

Our marketers named the approach above (using judging LLMs to cross-check inputs and and outputs) as “Guardian Agents.” And that’s just… cheesy 😜

But it does lend well with customers, and allows us to explain the methodology without getting too technical. True agency is years away, but using a combination of heuristic and probabilistic checks we can build a workflow that seems to be able to execute a specific task with fuzzy input and output that is nonetheless useful. As long as we can deliver on the promise of ‘guardian agents’ — and we do, by guiding our users and not over-promising — then there’s nothing wrong with a bit of cheese on occasion.

Too much isn’t good for you

Too much cheese and your doctor will start getting concerned about your cholesterol. Too much LLMs, and your audience and the environment won’t appreciate it. We make sure we use the smallest, most ethical models we could that can be controlled to generate the right result, in order to minimise the harm. Since ‘ethical models’ at this time is a bit of choosing the least-worse option, we’re poised to replace them when better options are available (i.e. no vendor lock-in). In the mean time, we minimise environmental and social harms as much as we can, and only use the functionality advisedly when the benefits outweigh the costs.

Hope you found this useful! Getting used to working with non-deterministic systems takes a bit of adjustment. Getting excited with a new hammer and trying to see what you can bash isn’t a good tactic, but using it when appropriate can be a wonderful boost.

Where and how have you seen good integrations of fuzzy logic like LLMs into products?

Rise of the Product Leader

Discussion about this post