Make Illegal AI Edits Unrepresentable

This post is also available in video form:

These are strange times we're living in. We have unleashed an army of junior developers on our codebases while changing little else, yet we expect everything to just be okay.

Granted, they aren't exactly junior developers; they are both smarter and dumber, all at once. But it doesn't change the reality that the cumulative effect of running all sorts of AIs in our codebases, with little to no supervision, can, and likely will, lead to ever increasing complexity and maintainability woes.

"No, no, I'm doing stringent code review on every line of code the AI generates", sure you do, buddy. You're running five agents, all at once, and then expect that you'll be able to thoroughly review every line of an ever growing volume of generated code?

Imagine a well-functioning team of five senior developers, working on a large codebase. The complexity is there, slowly growing, but they manage to wrangle it, somehow. Then they bring on some young blood, a brand new junior developer, fresh off the boat from college. The codebase is large and complex, unsurprisingly, the junior makes mistakes. But it's okay, the seniors are there, at least one of them is always available to guide, review, and clean up any mess that comes up. After a while, the junior is no longer a junior, but a young senior. Soon we'll have a team of six seniors, and we can bring in some more fresh blood. All is as it should be.

Next, imagine the same scenario, but now we bring in 25 juniors in one batch. If you think that it's going to work out fine, you're most likely delusional. The seniors will have no way to keep up with the juniors, and the code will end up in shambles.

More generally, when many people work on the same system all at once, be it seniors, juniors, or anything in-between, code quality tends to quickly plummet. It appears that humans, juniors and seniors alike, are pretty bad at taming complexity. They have trouble keeping track of enough context to see the consequences of small changes rippling across code and time.

Unfortunately, current generation AIs suffer from the same problem. The only difference is that the number of AIs working on code seems to explode these days, overwhelming whatever benefit that experienced developers can contribute to the process.

In the long run, if we want to be productive using AIs to develop code, we must use better ways to handle complexity.

Even before the advent of widespread AI tools, I spent most of my professional career thinking about the problem of keeping code complexity at bay in the presence of humans. And now I do the same in the presence of an army of AIs.

One of the most useful tools for the purpose of complexity management that I know of is the design principle coined by Yaron Minsky that says:

Make illegal states unrepresentable

And that's what I want to talk about now. To put this into today's context, we'll paraphrase a little bit. Our goal will be to:

Make illegal AI edits unrepresentable

The Pitch

I won't delve into explaining what "illegal states" or "representability" are, hopefully most of you have some intuition about them¹. What I want to highlight though, is that "making illegal states unrepresentable" is an excellent complement to the weaknesses of both humans and AIs that I described above.

The way one would typically enforce the unrepresentability of illegal states is at compile-time, using the type system. Unlike your average AI, the compiler is enforcing type constraints by taking your whole program as context. It doesn't "forget" things just because they are too far away in code or time. And while the wet and imprecise brain of an AI can miss things, the cold and calculating brain of a compiler never forgets anything you ever taught it.

The biggest takeaway from this is that the more rules and invariants we encode into the type system, the more helpful the compiler will be in enforcing those invariants. Deterministically and without fail. So if some illegal state needs to be avoided globally and forever, make it unrepresentable at compile-time, and whether you or your AI remember about it while editing code won't matter in the least.

Let's illustrate this with a concrete example.

The Illegal States

Let's pretend we are programming a coffee-making robot². Something like this:

There are many subsystems involved in operating such a thing. We'll focus on the coffee-ordering part. Since we are just a small startup, we start with a small proof of concept. For now, we support only one type of drink.

Apparently, cappuccino is very popular these days, so that's a good starting point for us. To order cappuccino the user needs to tell us what kind of milk should be used. We'll model the order with a record class³:

record CoffeeOrder(
    @Nonnull Milk milk) {}

It appears that choosing milk nowadays is a non-trivial task, so Milk is actually an enum with multiple choices. Notice the Nonnull annotation on the Milk field. I hear that nulls can be dangerous, so we are very explicit about how we handle them. Every cappuccino must have Milk specified.

After successfully making a cappuccino, we want to expand our repertoire. So might as well make a plain espresso, we already have all the ingredients to make it. Espresso is just cappuccino minus the milk:

record CoffeeOrder(
    @Nullable Milk milk) {}

Now milk is a nullable field. If the value is present we make a cappuccino, if not, we make an espresso.

We got the ball rolling, so let's make another drink: latte. It's just like cappuccino, but with more milk. So both latte and cappuccino must have Milk specified. The presence or absence of the milk field is no longer enough to indicate which type of drink we need. We'll have to disambiguate the drinks somehow:

enum DrinkType {
  CAPPUCCINO,
  ESPRESSO,
  LATTE,
}

The DrinkType enum will indicate which drink the user ordered. It can be used like so:

record CoffeeOrder(
    @Nonnull DrinkType drinkType,
    @Nullable Milk milk) {}

Now we can tell which type of drink to prepare, even when the milk field is present. We are still careful with nulls: drinkType is mandatory. Otherwise, we won't be able to tell which drink the user actually ordered. This is a nice way to future-proof ourselves so that we can handle other drinks as well.

With support for some basic drinks in place, we want to show how fancy our robot is. Let's add a couple more special drinks: affogato and Irish Coffee.

The new drinks require more ingredients, so we add them to the CoffeeOrder record:

record CoffeeOrder(
    @Nonnull DrinkType drinkType,
    @Nullable Milk milk,
    @Nullable Gelato gelato,
    @Nullable Cream cream,
    @Nullable Whiskey whiskey) {}

Here we have gelato for the affogato, and cream and whiskey for the Irish coffee. The new fields are all nullable, as they are only relevant for these specific drinks.

We also update DrinkType to support the new drinks:

enum DrinkType {
  CAPPUCCINO,
  ESPRESSO,
  LATTE,
  AFFOGATO,
  IRISH_COFFEE,
}

If we take a step back and look at what we have here, we'll see that although we are only supporting five drinks, this is already quite the mess. The CoffeeOrder record is definitely capable of representing all the drinks we want it to. But along with that it's capable of representing nonsensical drinks. Some of them, like espresso with gelato are just redundant. But others make no sense at all. Like Irish coffee without whiskey, which is an illegal state if I ever seen one. And so, we made illegal states representable in our system.

This example might look a bit contrived, who ever just adds many fields like this to a class all at once? But this is based on actual experience, where a team of developers keeps adding fields over time, without immediately noticing the implications. And before you know it, you end up with a monster full of potential illegal states.

All is not lost though, we are conscientious developers. We did our best with the nullability annotations. We can also add our knowledge about the different invariants to the code as well. Here's the final version of CoffeeOrder:

/**
 * Invariants:
 * - For cappuccino: milk must not be null, all other nullables should be null
 * - For espresso: all nullable field should be null
 * - For latte: milk must not be null, all other nullables should be null
 * - For affogato: gelato must not be null, all other nullables should be null
 * - For Irish coffee: cream and whiskey must not be null, all other nullables should be null
 */
record CoffeeOrder(
    @Nonnull DrinkType drinkType,
    @Nullable Milk milk,
    @Nullable Gelato gelato,
    @Nullable Cream cream,
    @Nullable Whiskey whiskey) {
}

This comment is very thorough, now anyone who forgets about the invariants has a complete reference. Is this good enough?

Getting in the Flow

How do we actually use CoffeeOrder? We are in the business of taking orders and operating robots. So here's the pipeline that we'll be implementing:

In this flow we take in a new CoffeeOrderDTO, which represents the raw data we got from the user. It's defined like so:

record CoffeeOrderDTO(
    @Nullable DrinkType drinkType,
    @Nullable Milk milk,
    @Nullable Gelato gelato,
    @Nullable Cream cream,
    @Nullable Whiskey whiskey) {}

Basically the same thing as CoffeeOrder, but now everything is nullable and we don't assume any invariants about it. We have to validate CoffeeOrderDTO before we can convert it into a proper CoffeeOrder. This is where OrderProcessor comes into play. The process method takes in a CoffeeOrderDTO and converts it into a CoffeeOrder. Here we'll be careful to maintain all the CoffeeOrder invariants that we care about. With a CoffeeOrder ready we send it to CoffeeMachine, which will produce the instructions to the robot according to the CoffeeOrder we have.

To keep things contained, we'll only focus on a single drink. And so we'll just look at the code that is responsible for Irish coffee.

Let's start with OrderProcessor:

public class OrderProcessor {

  public CoffeeOrder process(@Nonnull CoffeeOrderDTO order) { // 1
    var drinkType = // 2
      order.drinkType() == null ?
        DrinkType.CAPPUCCINO : order.drinkType();

    return switch (drinkType) { // 3
      ...
      case IRISH_COFFEE -> processIrishCoffee(order); // 4
    };
  }

Here we take the input CoffeeOrderDTO (1). As a first step we are trying to determine the DrinkType. Since it cannot be missing, we provide a default value (cappuccino) in case it's null (2). We then switch on the DrinkType (3). In case it's IRISH_COFFEE we process the order using the processIrishCoffee method (4). Which is defined as follows:

private CoffeeOrder processIrishCoffee(CoffeeOrderDTO order) {
    return new CoffeeOrder( // 1
        DrinkType.IRISH_COFFEE, // 2
        null,
        null,
        order.cream() == null ? Cream.HALF_AND_HALF : order.cream(), // 3
        order.whiskey() == null ? Whiskey.JAMESON : order.whiskey()); // 4
}

We finally create a CoffeeOrder (1) of type IRISH_COFFEE (2). At this point we have to populate its fields according to the invariants that we defined for Irish coffee. So we extract the Cream value, and provide a fallback in case it's missing (3). And do the same for the Whiskey value (4). The rest is not relevant for Irish coffee, so we set it to null.

With this code in place, we kept all the invariants that are dictated by CoffeeOrder. Illegal states are nowhere in sight. We could even write a unit test to make sure this is so.

The next step is to program the CoffeeMachine.

Suppose we have a method that tells the robot to prepare the Irish coffee. Given the user's choice of cream and whiskey it instructs the robot to prepare Irish coffee:

private void prepareIrishCoffee(
  @Nonnull Cream cream,
  @Nonnull Whiskey whiskey)

Because Irish coffee makes no sense without cream and whiskey, the method requires the cream and whiskey arguments to be non-null. This reflects the rationale for the invariants in CoffeeOrder.

We'll use the prepareIrishCoffee method in CoffeeMachine.prepare when converting a CoffeeOrder into instructions for the robot:

public class CoffeeMachine {
  public void prepare(CoffeeOrder order) {
    switch (order.drinkType()) { // 1
      ...
      case IRISH_COFFEE -> { // 2
        prepareIrishCoffee( // 3
            order.cream(),
            order.whiskey());
      }
    }
  }
}

We switch on the order's DrinkType (1) and in the case where the drink type is IRISH_COFFEE (2) we extract the cream and whiskey values and pass them on (3). That's okay, right? We have all those invariants in place (here's to good documentation) because it makes no sense for those values to be empty at this stage. OrderProcessor made sure of this.

Unfortunately, this doesn't compile:

CoffeeMachine.java:23: error: [NullAway] passing @Nullable parameter 'order.cream()' where @NonNull is required
            order.cream(),
                       ^
CoffeeMachine.java:24: error: [NullAway] passing @Nullable parameter 'order.whiskey()' where @NonNull is required
            order.whiskey());
                         ^
2 errors

Ah, we just specified that prepareIrishCoffee takes non-null values, and despite our invariants, nothing guarantees to the compiler that the values are actually not null. But we know that everything is fine here. Let's insist on that:

public class CoffeeMachine {
  public void prepare(CoffeeOrder order) {
    switch (order.drinkType()) {
      ...
      case IRISH_COFFEE -> {
        // This is SAFE // 1
        // Irish coffee ALWAYS has cream and whiskey
        if (order.cream() == null) // 2
          throw new IllegalStateException("No cream");
        if (order.whiskey() == null)
          throw new IllegalStateException("No whiskey");

        prepareIrishCoffee(
            order.cream(),
            order.whiskey());
      }
    }
  }
}

We added a defensive null check to satisfy the compiler (2), and we throw an exception if the invariant is broken. But we know it's safe, and just to be extra careful we leave an emphatic comment for future reference about just how impossible it is for the values to be missing here (1). With the null check in place this code now compiles.

Our pipeline is now complete, all invariants are kept. We can even run it to make sure it works:

// 1
var processor = new OrderProcessor();
var coffeeMaker = new CoffeeMachine();

// 2
var dto1 = new CoffeeOrderDTO(
    DrinkType.IRISH_COFFEE, null, null, Cream.HEAVY, Whiskey.TULLAMORE);

// 3
var dto2 = new CoffeeOrderDTO(
    DrinkType.IRISH_COFFEE, null, null, Cream.HEAVY, null);

// 4
var order1 = processor.process(dto1);
var order2 = processor.process(dto2);

// 5
System.out.println(order1);
System.out.println(order2);

// 6
coffeeMaker.prepare(order1);
coffeeMaker.prepare(order2);

In this simple script we initialize the different components of the flow (1). We then create a couple of CoffeeOrderDTOs. The first is for Irish coffee, with all the relevant data present (2). The second is missing a whiskey value (3). Next we process both orders and generate CoffeeOrder instances (4). We print those values so that we can see they are okay (5). Lastly, we prepare the orders (6).

Running this produces the following output:

CoffeeOrder[drinkType=IRISH_COFFEE, milk=null, gelato=null, cream=HEAVY, whiskey=TULLAMORE]
CoffeeOrder[drinkType=IRISH_COFFEE, milk=null, gelato=null, cream=HEAVY, whiskey=JAMESON] // 1
Making Irish coffee with: HEAVY cream and TULLAMORE whiskey
Making Irish coffee with: HEAVY cream and JAMESON whiskey // 2

We can see that the missing whiskey value in the second order was filled in with the default value (1), and the robot happily fulfilled the order (2).

Everything is fine, no invariants were broken, no illegal states encountered.

This is our baseline, now let's see how things break over time...

AI vs. Illegal State

Time passes by, everything works correctly, and you forget all about this code. But then, a new requirement comes in. Recall how we used a hardcoded Whiskey.JAMESON value when the whiskey value is missing from the order. Well, we noticed that the whiskey stock in the coffee shop keeps fluctuating, and it's possible that Jameson is out of stock today. We no longer want to hardcode this value, instead we'll query a dedicated service to obtain the default value for today.

For this we'll have a new service definition:

interface WhiskeyService {

  @Nullable
  public Whiskey fetchTodayDefaultWhiskey();
}

It's just a simple fetcher. Since things can go wrong with fetching, the output is nullable, let our clients deal with that. We want to use an instance of this service to fetch the value instead of the hardcoded value we currently have.

Additionally, there's an implementation of WhiskeyService called WhiskeyServiceDefault:

public class WhiskeyServiceDefault implements WhiskeyService {

  @Nullable
  @Override
  public Whiskey fetchTodayDefaultWhiskey() {
    var rnd = new Random();
    var whiskies = Whiskey.values(); // 1

    var whiskey = whiskies[rnd.nextInt(whiskies.length)]; // 2

    return whiskey; // 3
  }
}

For purposes of demonstration, it just returns a random value each time it's called. In (1) we obtain an array of all available whiskies. Then we choose a random whiskey from all the possible whiskey values (2), and return it (3).

Since we're living in the future, we won't be implementing the new feature ourselves, manually, like some plebs. Instead, we'll use our handy AI agent⁴.

Here's the prompt for this task:

> In @OrderProcessor.java we no longer want to hardcode the
default `Whiskey` value since we don't know whether it's
available today or not. Use @WhiskeyService.java to inject
the value dynamically instead. In @Main.java use
@WhiskeyServiceDefault.java. Use `gradlew build` to verify
that everything compiles.

Nothing fancy here. The AI performs the relevant edits, without breaking a sweat. Here's the summary of what it's done:

● Done! I've successfully:

1. Modified OrderProcessor.java:25 to use
   whiskeyService.fetchTodayDefaultWhiskey() instead of
   hardcoded Whiskey.JAMESON
2. Added WhiskeyService dependency injection to
   OrderProcessor constructor
3. Updated Main.java:5-6 to create WhiskeyServiceDefault
   instance and inject it into OrderProcessor
4. Verified compilation with gradlew build

The new code ⁵ looks very reasonable, with all the modern bells and whistles of dependency injection, and whatnot. Here's the new code in processIrishCoffee:

private CoffeeOrder processIrishCoffee(CoffeeOrderDTO order) {
  return new CoffeeOrder(
      DrinkType.IRISH_COFFEE,
      null,
      null,
      order.cream() == null ? Cream.HALF_AND_HALF : order.cream(),
      order.whiskey() == null ? whiskeyService.fetchTodayDefaultWhiskey() : order.whiskey()); // 1
}

As we can see in (1) the default whiskey is no longer hardcoded, we delegate to WhiskeyService instead. Cool, this fulfills the requirements as we stated them.

We can run the example code from before:

CoffeeOrder[drinkType=IRISH_COFFEE, milk=null, gelato=null, cream=HEAVY, whiskey=TULLAMORE]
CoffeeOrder[drinkType=IRISH_COFFEE, milk=null, gelato=null, cream=HEAVY, whiskey=BUSHMILLS] // 1
Making Irish coffee with: HEAVY cream and TULLAMORE whiskey
Making Irish coffee with: HEAVY cream and BUSHMILLS whiskey

Since the default whiskey is no longer hardcoded, every time we run the code we'll get a different whiskey value. This time it was BUSHMILLS (1).

... cue in ominous music...

The AI fell for the trap I set for it. Can you see it?

As it currently stands the code doesn't exhibit any bugs. But after this simple edit a bug is latent in the codebase. Neither the AI, nor the human, or even the test suite caught anything.

We move on with our life, busying ourselves with other parts of the codebase. Until one day we get another feature request.

It appears the stock of whiskeys fluctuates so rapidly that we still sometimes try to serve whiskey that's out of stock, even without hardcoding the value. To fix this in the WhiskeyService implementation we are asked to check the current stock and avoid choosing a whiskey that is out of stock.

To do this we have another service we can query:

interface WhiskeyStockService {
  // invariant: all Whiskey types are present in the resulting Map // 1
  @Nonnull
  public Map<Whiskey, Integer> fetchTodayStock(); // 2
}

WhiskeyStockService.fetchTodayStock gives us a snapshot of the current stock (2). There's also a helpful comment about an invariant here, which we'll trust blindly. Now we can use WhiskeyStockService to implement the new requirement (for demonstration purposes WhiskeyStockServiceDefault will return random values for the stock).

Let's prompt the AI:

> In @WhiskeyServiceDefault.java it's possible that we used a
whiskey that's not in stock today. Use
@WhiskeyStockService.java to only return the chosen whiskey
if it's in stock at the moment. In @Main.java use
@WhiskeyStockServiceDefault.java. Use `gradlew build` to
verify that everything compiles.

The AI chugs along. It stumbles for a moment because the compiler was not impressed by the invariant that was stated in the comment in WhiskeyStockService. But finally we get back the results:

● Done! I've successfully updated the code to check whiskey
stock availability:

1. WhiskeyServiceDefault.java now uses WhiskeyStockService
   to validate stock before returning a whiskey
2. Main.java now creates and uses WhiskeyStockServiceDefault
3. The build compiles successfully with gradlew build

The implementation now only returns whiskeys that are in
stock (stock level > 0), and returns `null` if no whiskeys
are available.

The new WhiskeyServiceDefault:

@Nullable
@Override
public Whiskey fetchTodayDefaultWhiskey() {
  var rnd = new Random();
  var whiskies = Whiskey.values();
  var stock = stockService.fetchTodayStock();

  for (int attempts = 0; attempts < whiskies.length; attempts++) {
      var whiskey = whiskies[rnd.nextInt(whiskies.length)];
      var stockLevel = stock.get(whiskey);
      if (stockLevel != null && stockLevel > 0) {
          return whiskey;
        }
    }

  return null; // 1
}

Let's ignore the question of whether this fulfills the requirements; vague human language sure tends to produce vague results. What's important is that this code now exhibits a bug. Despite the @Nullable annotation, the original implementation of fetchTodayDefaultWhiskey would never return a null value in practice. With the new implementation this is no longer the case. The code can, and depending on today's stock, will eventually return null (1).

Why is this a problem? Recall that after our first round of edits we had this code:

return new CoffeeOrder(
    DrinkType.IRISH_COFFEE,
    null,
    null,
    order.cream() == null ? Cream.HALF_AND_HALF : order.cream(),
    order.whiskey() == null ? whiskeyService.fetchTodayDefaultWhiskey() : order.whiskey());

We blindly take whatever fetchTodayDefaultWhiskey returns and pass it along. The null checking that the compiler is doing for us doesn't help us, since whiskey is a nullable field. It shouldn't be null only in specific circumstances (DrinkType == IRISH_COFFEE) which the compiler is incapable of checking.

For bonus points, this bug in intermittent, it only happens in specific circumstances, good luck debugging it when it hits production.

If we run our test script, most of the times everything will be fine:

CoffeeOrder[drinkType=IRISH_COFFEE, milk=null, gelato=null, cream=HEAVY, whiskey=TULLAMORE]
CoffeeOrder[drinkType=IRISH_COFFEE, milk=null, gelato=null, cream=HEAVY, whiskey=BUSHMILLS]
Making Irish coffee with: HEAVY cream and TULLAMORE whiskey
Making Irish coffee with: HEAVY cream and BUSHMILLS whiskey

But some times, when we hit the right conditions (all chosen whiskeys are out of stock), we get an exception:

CoffeeOrder[drinkType=IRISH_COFFEE, milk=null, gelato=null, cream=HEAVY, whiskey=TULLAMORE]
CoffeeOrder[drinkType=IRISH_COFFEE, milk=null, gelato=null, cream=HEAVY, whiskey=null] // 1
Making Irish coffee with: HEAVY cream and TULLAMORE whiskey
Exception in thread "main" java.lang.IllegalStateException: No whiskey
        at CoffeeMachine.prepare(CoffeeMachine.java:20)
        at Main.main(Main.java:23)

We can see in (1) that we managed to get null for the whiskey field, and so the null checks we were forced to place in CoffeeMachine throw an exception.

Remember that emphatic comment we had above?

// This is SAFE
// Irish coffee ALWAYS has cream and whiskey

So much for the efficacy of comments...

The Root Cause

What went wrong? The root cause of this is that we had an important invariant that was not enforced at compile-time. It was only encoded as textual knowledge (comments), not anything that is mechanically enforceable. We made an illegal state representable in our system, and given enough code and time, we eventually managed to hit it.

It's no wonder that simple humans fall for such things all the time. The problem is that current AIs are prone to falling for this just the same. Both humans and AIs use a local context of some random size when writing code. And whether or not they will notice this or that global invariant being violated is given up to chance⁶.

"But", you might object, "I can cover the invariant with tests, and those tests will keep the AI in check". Alas, that's not good enough either. You can definitely cover all known relevant code paths with tests. But what about code paths that do not yet exist? How do you know what future code path you will create and how it interacts with the invariants you've set?

And to head off the obvious objection that your AI can handle this trap and not fall for it. Sure, it's a tiny codebase, with only a few files. I was even nice enough to spell out the invariant very explicitly in a comment⁷, in a way that an AI can definitely pick up. But this will only happen if the AI "sees" that comment. Given a sufficiently large codebase, with a sufficiently widespread invariant, your AI is bound to miss it as well.

The same applies for people that are more skilled at prompting than I am. Maybe you can prompt better, but nothing will save your prompt if you're not even aware of the danger you are trying to circumvent.

Whatever your objections are to the details of this conceptual example, you cannot change the fact that in general you will not be able to consistently enforce global invariants with tools that only apply to a limited local scope. Humans, current AIs, and unit tests, all fall under the category of "local" tools.

If an illegal state is representable in your system, you and your AI will eventually hit it. And those odds multiply as you run more and more agents generating ever increasing amounts of code.

The best way to not fall for it is to make the illegal state unrepresentable, and by so making the illegal AI edits we just seen unrepresentable as well.

The Unrepresentable

In a parallel universe, where developers know what's good for them and their sanity, every time they recognize an illegal state of the system, they make it unrepresentable. This is not always easy, and depending on what's at stake might not even be worth it. But for our example, the fix is fairly straightforward.

The problem starts with CoffeeOrder which is capable of representing illegal values. If we remodel CoffeeOrder to only allow legal combinations, the rest will follow. One of the best techniques to do that is to use a sum type. In Java this means using a sealed interface, where each admitted class represents one (legal) combination of values.

Let's rewrite CoffeeOrder⁸:

public sealed interface CoffeeOrder { // 1
  record Cappuccino(@Nonnull Milk milk) implements CoffeeOrder {} // 2

  record Espresso() implements CoffeeOrder {} // 3

  record Latte(@Nonnull Milk milk) implements CoffeeOrder {} // 4

  record Affogato(@Nonnull Gelato gelato) implements CoffeeOrder {} // 5

  record IrishCoffee( // 6
      @Nonnull Cream cream,
      @Nonnull Whiskey whiskey) implements CoffeeOrder {}
}

CoffeeOrder is now a sealed interface (1), with all the admitted classes nested in it. Each implementation of CoffeeOrder is one possible drink that we currently allow. Accordingly:

Cappuccino (2) has one Milk field, which is now enforced to be @Nonnull, as per the invariants we described before
Espresso (3) has no fields, so we no longer need to set anything to be nullable just for it
Latte (4) has a @Nonnull Milk as well, and we don't have any ambiguity with Cappuccino since it's a separate subtype
Affogato (5) has one mandatory Gelato field, which by virtue of being a separate class doesn't affect anything else
Lastly, IrishCoffee, our nemesis, explicitly declares its requirements with a pair of Cream and Whiskey mandatory fields

The CoffeeOrder datatype perfectly captures all the requirements that we stated before. Better yet, it does that at the type-level, where the compiler can enforce it. There's no longer a need for a comment nobody's going to read. CoffeeOrder is a precise specification of what is allowed in the system.

Whereas the original CoffeeOrder was "too big", allowing for a mismatch between its DrinkType and the nullable fields. The new CoffeeOrder subsumes DrinkType into its subclasses, perfectly aligning the required data with the drink being ordered.

Now we have to adapt the code to the new datatype, but this time we no longer have to worry about invariants, the compiler will nudge us in the right direction at every step.

We'll start with OrderProcessor, we only need to change processIrishCoffee:

private CoffeeOrder processIrishCoffee(CoffeeOrderDTO order) {
  return new CoffeeOrder.IrishCoffee( // 1
    order.cream() == null ? Cream.HALF_AND_HALF : order.cream(), // 2
    order.whiskey() == null ? Whiskey.JAMESON : order.whiskey());
}

The code looks pretty much the same, but now we initialize the specific subtype of CoffeeOrder that we need here, IrishCoffee (1). The cream and whiskey fields are initialized the same as in our baseline implementation (2). Unlike before we no longer have to provide the dummy null values for the fields that are irrelevant in this case. Because there aren't any, IrishCoffee is very precise.

What's invisible in this code is the fact that the arguments to IrishCoffee are not allowed to be null like they were before. We'll later see how this saves our code from bugs.

Next we'll adapt CoffeeMachine:

public void prepare(CoffeeOrder order) {
  switch (order) {
    ...
    case CoffeeOrder.IrishCoffee irishCoffee -> {
      prepareIrishCoffee(
          irishCoffee.cream(),
          irishCoffee.whiskey());
    }
  }
}

This code is the same as our very first attempt to write it, but this time the compiler won't complain. The fields of IrishCoffee cannot be null, and we can safely pass them to prepareIrishCoffee.

With these few small changes we managed to encode the invariants we care about globally, in a way that can be enforced mechanically and deterministically by the compiler. Will this code withstand the pressure of humans and AIs trying to break it?

AI vs. The Unrepresentable

Let's repeat the same prompt that we tried before and see how the AI manages now. Recall the prompt:

> In @OrderProcessor.java we no longer want to hardcode the
default `Whiskey` value since we don't know whether it's
available today or not. Use @WhiskeyService.java to inject
the value dynamically instead. In Main.java use
@WhiskeyServiceDefault.java. Use `gradlew build` to verify
that everything compiles.

The AI happily chugs along trying to make the same modification it did before:

private CoffeeOrder processIrishCoffee(CoffeeOrderDTO order) {
  return new CoffeeOrder.IrishCoffee(
      order.cream() == null ? Cream.HALF_AND_HALF : order.cream(),
      order.whiskey() == null ? Whiskey.JAMESON : order.whiskey());
      order.whiskey() == null ? whiskeyService.fetchTodayDefaultWhiskey() : order.whiskey());
}

But this fails to compile. The arguments to IrishCoffee cannot be null, but fetchTodayDefaultWhiskey is marked Nullable. The AI reacts with:

● I need to handle the null case from the WhiskeyService.
...
The issue is that fetchTodayDefaultWhiskey() can return
null, but the CoffeeOrder.IrishCoffee constructor expects a
non-null whiskey.

Which is great, and exactly the sort of process that making illegal states unrepresentable should trigger. The AI then suggests the following modification:

private CoffeeOrder processIrishCoffee(CoffeeOrderDTO order) {
  var defaultWhiskey = whiskeyService.fetchTodayDefaultWhiskey();

  // 1
  var whiskey = order.whiskey() != null ? order.whiskey() :
                defaultWhiskey != null ? defaultWhiskey : Whiskey.JAMESON;

  return new CoffeeOrder.IrishCoffee(
      order.cream() == null ? Cream.HALF_AND_HALF : order.cream(),
      order.whiskey() == null ? defaultWhiskey : order.whiskey());
      whiskey);
}

In (1) there is now a fallback in case WhiskeyService produced a null, which keeps the original hardcoded value.

Here the timeline splits. If we aren't carefully reviewing everything the AI is doing, which is getting more and more likely as people offload more code to agents, we might accept this solution. Despite the fact that it doesn't actually fulfill our requirements for not hardcoding values. Such is life. But, crucially, we didn't break any invariants.

On another timeline, we might've noticed that the requirements weren't fulfilled (or the prompt was more insistent about it), in which case we reject this edit. Likely or not, I'll continue on this timeline, just for the sake of seeing where this might lead:

> Never use a hardcoded value

Which produces the following code:

private CoffeeOrder processIrishCoffee(CoffeeOrderDTO order) {
  var whiskey = order.whiskey();

  if (whiskey == null) {
    whiskey = whiskeyService.fetchTodayDefaultWhiskey();
    if (whiskey == null) {
      throw new IllegalStateException("No whiskey available today"); // 1
    }
  }
  return new CoffeeOrder.IrishCoffee(
      order.cream() == null ? Cream.HALF_AND_HALF : order.cream(),
      whiskey);
}

This solution throws an exception when the fetched whiskey is null (1). Seeing how we went in this direction due to a runtime exception being thrown where we didn't expect it, we'll reject this solution as well⁹:

> No exceptions

To which we get this response:

● I understand. Let me think about this differently. Since
the WhiskeyService interface is @Nullable but we need a
non-null value, and you don't want hardcoded values or
exceptions, perhaps the solution is to modify the
WhiskeyServiceDefault to never return null, or change the
interface contract.

Looking at the current WhiskeyServiceDefault implementation,
it actually never returns null - it always returns a random
whiskey. Should I remove the @Nullable annotation from the
interface to reflect this reality?

Nope:

> It's impossible to guarantee that `fetcTodayDefaultWhiskey`
is never `null`

Now the AI goes on a tangent, trying to swap the order for a hardcoded cappuccino if the whiskey is missing. This might be acceptable to some, but not in this house:

> Changing the drink silently is not user friendly

● You're right. If whiskey isn't available, we shouldn't
silently change the drink type. Looking at this problem, I
think we need to modify the CoffeeOrder.IrishCoffee to
accept a @Nullable whiskey parameter, since that's the
reality of the business logic - sometimes whiskey might not
be available.

But no:

> It makes no sense to have Irish coffee without whiskey

● You're absolutely right. An Irish coffee without whiskey
isn't an Irish coffee. Let me look at this from a different
angle - perhaps we need to handle this at the order
processing level by not allowing Irish coffee orders when no
whiskey is available.

Looking at the current flow, when someone orders an Irish
coffee but no whiskey is specified in the order AND the
service returns null (no whiskey available today), we have
an impossible state to represent. // 1

Maybe the solution is to return an Optional<CoffeeOrder> or
similar from the process method, or have a different
approach to handle unavailable ingredients. What would you
prefer as the approach to handle this case where Irish
coffee is ordered but no whiskey is available?

We will stop here, but this is a great interaction. By getting pushed around by the compiler and the illegal state that can't be represented here, the AI reached the conclusion that we have "an impossible state" (1). But crucially with all the constraints in place, we didn't hit the original bug, and we are now forced to consider our design head on. Whether we like the new suggested design is up for debate, but at least now we are actually having this discussion upfront.

This is the magic of making illegal states unrepresentable, not only does it keep us from making mistakes, across code and time; but it also forces us to consider design implications as requirements change. Whether you're a human or an AI, the benefits are the same.

Here we conclude this little demonstration.

Conclusion

It's not the fault of the AI that it can't enforce global invariants, or more generally keep track of the global properties of code as it grows and evolves over time. It, just like us puny humans, was never designed to do that. Maybe one day this will change, but until then, as conscientious software engineers we must utilize whatever tools we have at our disposal to compensate for this deficiency.

The sharp-eyed readers amongst you might've noticed that nothing about the pitfalls in the example code was specifically tailored to the weaknesses of AI tools. A human developer, might fall for the very same things (appropriately scaled into a larger codebase). And that's not a bad thing, it just means that we have decades of experience in tackling these issues. Experience that we can apply to help AIs just as we managed to help humans in the past.

Making illegal states unrepresentable is one approach, or rather a design principle, to enforce global invariants, that we can use. By consistently making illegal states unrepresentable in our code, we offload more responsibility to the type system and the compiler. Tools that were actually designed to enforce global properties of code. From now and till forever, never relenting and never forgetting.

Quoting from Yaron Minsky:

I think that people ... think of the type system as merely a constraint, that is to say, you write code as you did before, and then you see whether the compiler accepts it. As people get more experienced, they start to use the type system as a tool in its own right, and one of the things that is constantly on one's mind is how to pass off as much of the proof burden as possible to the compiler.

We've seen a taste of this responsibility offloading above, it takes some practice, but there are many other techniques that align with this more compiler-aware approach to programming. Not the least of which is to stop lying to the compiler. That is, to stop hiding information from the compiler by, e.g., using type casting, nulls, runtime exceptions, and the like. In a future installment we'll explore how we can limit the number of lies we tell the compiler, making the type system more useful, and the AI tooling less error-prone.

The mantra of "make illegal states unrepresentable" is at least a couple of decades old. Unfortunately for the software industry, to this day it didn't make it into the mainstream. But now, under the barrage of AI-generated code, is the perfect time to start adopting it.

May your illegal AI edits be forevermore unrepresentable...

If you enjoyed this, reach out for a workshop on Functional Programming, where I teach how to apply Functional Programming to improve code quality in any language.

Permalink Twitter Reddit LinkedIn Hacker News Facebook WhatsApp

✓ Link copied to clipboard!

And if not, I gave a talk about this very topic. It's in Scala but can be applied to any statically-typed language. ↩
This example is loosely based on one of the examples in a talk that I gave in the past (translated from Scala to Java). ↩
You can find the full code for the examples in the repo. ↩
Although it doesn't really matter for the purposes of this demonstration, but in this case I'm using Claude Code version 1.0.33, with whatever model Claude chooses to use by default. ↩
You can see the full code with the AI edits on this branch. ↩
For better or for worse, some times, depending on the exact prompt and the state of the code, the AI would circumvent the trap I set for it. But is this good enough for us? Leaving important business invariants to chance like that? ↩
Something that unfortunately actual developers rarely do in real life. ↩
You can find the new code on this branch. ↩
We could ask for a checked exception instead, which is arguably less surprising as it gets tracked by the type system, and we will be forced to handle it. Unfortunately, programmers are historically pretty bad at dealing with checked exceptions, so let's see where we can get to without them. Ideally, whatever you choose should become a prompt rule that the AI always have to follow. ↩

# Make Illegal AI Edits Unrepresentable

The Pitch

The Illegal States

Getting in the Flow

AI vs. Illegal State

The Root Cause

The Unrepresentable

AI vs. The Unrepresentable

Conclusion

# First Make It Correct

# Random Scala Tip #534: Adopt an Error Handling Convention for `Future`

# Make Illegal AI Edits Unrepresentable

The Pitch

The Illegal States

Getting in the Flow

AI vs. Illegal State

The Root Cause

The Unrepresentable

AI vs. The Unrepresentable

Conclusion

Footnotes

# First Make It Correct

# Random Scala Tip #534: Adopt an Error Handling Convention for `Future`