Purify Your Tests - Daniel Beskin's Blog

Here's a fun little trick to purify your tests. When I say "purify" I mean it in the pure functional programming sense of making the tests side-effect-free. But more specifically, I'm going to show you a way to avoid mutable mocks.

Testing Side-Effects

To make this concrete, we're going to use an example. Although this is Scala, the technique I'm about to describe is applicable in any statically-typed language with generics/parametric polymorphism, Java¹ to Haskell included.

So suppose you're writing your very cool UberService ²

class UberService(
  fetcher: Fetcher,
  enricher: Enricher,
  bookkeeper: Bookkeeper,
  storage: Storage):

  def fetchAndStore(user: UserID): Unit = // 1
    val data = fetcher.fetch(user) // 2
    val enriched = enricher.enrich(user, data) // 3

    bookkeeper.bookkeep(data, enriched) // 4
    storage.store(enriched) // 5

UberService is the pinnacle of engineering:

It has one method fetchAndStore (1).
As the name states, it does some data-fetching (2).
It then enriches the data (3).
Since this is an important process we do some bookkeeping (4) about the fetched/enriched data.
Lastly, we store the enriched result into some storage (5).

And because this is the pinnacle of engineering, we want UberService to be testable. To that end it doesn't actually do anything on its own, instead it only orchestrates the Fetcher, Enricher, Bookkeeper, and Storage instances. Since they are all abstract traits, we can mock them out in tests, and verify that UberService actually does what's written on the tin.

First we need to set up some mocks³ ⁴.

Here are the mocks for the Fetcher and Enricher traits. These are easy, they are just pure functions, on every input they return a deterministic output:

object TestFetcher extends Fetcher:
  def fetch(user: UserID): UserData = UserData(s"data: ${user.value}")

object TestEnricher extends Enricher:
  def enrich(user: UserID, data: UserData): EnrichedUserData =
    EnrichedUserData(s"enriched: ${user.value} - ${data.value}")

Next we need to mock the Bookkeeper trait. This one's trickier, since its functionality is a pure side-effect, all we "return" from bookkeep is Unit, which doesn't provide us any new information within the context of the test.

class TestBookkeeper extends Bookkeeper:
  var bookkept: List[(UserData, EnrichedUserData)] = List.empty // 1

  def bookkeep(original: UserData, enriched: EnrichedUserData): Unit =
    bookkept ::= ((original, enriched)) // 2

As a result, to be able to test proper usage of Bookkeeper we have to use some mutable state:

In (1) we initialize a var⁵ to hold the data that will be passed by UberService to our mock.
In (2) we add the data that was passed to the bookkeep method.

And similarly for Storage:

class TestStorage extends Storage:
  var stored: List[EnrichedUserData] = List.empty

  def store(data: EnrichedUserData): Unit =
    stored ::= data

Enough mocks, let's get to business. Now we can write a test for UberService:

"The uber-service" should:
  "fetch the user data, enrich it, and store the results" in:
    // 1
    val bookkeeper = new TestBookkeeper
    val storage = new TestStorage

    val service = new UberService(TestFetcher, TestEnricher, bookkeeper, storage) // 2

    val expectedUserData = UserData(s"data: 5")
    val expectedEnriched = EnrichedUserData("enriched: 5 - data: 5")

    service.fetchAndStore(UserID(5)) // 3

    // 4
    bookkeeper.bookkept shouldBe List((expectedUserData, expectedEnriched))
    storage.stored shouldBe List(expectedEnriched)

The test is pretty-straightforward:

We initialize some mocks (1).
Create a new a UberService instance (2) with these mocks.
Then execute the fetchAndStore method (3), which is the thing actually being tested.
And we assert the contents of our mocks, making sure they got the right data passed into them.

By doing all that we verified that UberService indeed performs the orchestration expected of it.

But this kind of sucks...

The Problem

The Bookkeeper and Storage mocks are annoying. Writing mutable mocks is tedious and error-prone⁶. Luckily for us UberService is completely synchronous, but what would happen to that poor var if we add some concurrency to the game? Not to mention race conditions and the like.

Needing to inspect the innards of our mocks to be able to write the test is a definite code smell.

To contrast, the mocks for Fetcher and Enricher are much simpler, they can be modeled by pure functions, given some input they produce some output. We do not have to deal with mutable state to see that they are used properly by UberService. Their outputs can be used as indication that the functionality of Fetcher and Enricher was employed correctly.

It all boils down to a single source of problems: we are trying to test side-effects. Since side-effects are not observable without other side-effects, we have to resort to using mutation to catch them.

But what can we do, Bookkeeper and Storage are side-effectful components. Do we have any other choice?

The Non-Solutions

What makes Fetcher and Enricher nice to work with is that they produce outputs when called, while Bookkeeper and Storage only return Unit. Well, what about adding some outputs to them. But what should we return? Here's one attempt:

trait Bookkeeper:
  def bookkeep(original: UserData, enriched: EnrichedUserData): Bookkept // 1

sealed trait Bookkept
object Bookkept extends Bookkept // 2

In this new edition of Bookkeeper instead of returning Unit we are returning a new type called Bookkept (1). Which is defined as a singleton (2).

Great, we now have an output, just like Fetcher. Can we get rid of the unpleasant mocks? Nope.

Since Bookkept is a singleton with no information beyond its existence, that is to say, isomorphic to Unit, we cannot use it to glean any information on how Bookkeeper was invoked. This is a dead-end.

Okay, so let's add some information to the output:

trait Bookkeeper:
  def bookkeep(original: UserData, enriched: EnrichedUserData): (UserData, EnrichedUserData) // 1

class UberService(/* same as before */):

  def fetchAndStore(user: UserID): Bookkept =
    // same as before

    val bookkept = bookkeeper.bookkept(data, enriched) // 2

    storage.store(enriched)

    bookkept // 3

In this version of Bookkeeper we are returning the data that we need for the test (1), a tuple of (UserData, EnrichedUserData). This should give us everything we need. To be able to use that data, we have to hold on to it in fetchAndStore (2), and then return it as the result of fetchAndStore (3).

This works, but feels kind of iffy. Why should the real production code know about the specific data that is needed for the test? And what if we have different kinds of tests that care about different pieces of data? Do we have to accommodate them all in our real code?

The resulting code is both too rigid and concerns itself with something it shouldn't care about. How can we make it both more flexible and more ignorant ⁷?

The Solution

When some code requires both flexibility and ignorance it is very likely that the solution is to add a type parameter. Let me demonstrate:

trait Bookkeeper[A]: // 1
  def bookkeep(original: UserData, enriched: EnrichedUserData): A // 2

Now Bookkeeper takes a type-parameter (1), and returns a value of that type as a result. This gives us maximum flexibility. In concrete implementations we can choose A to be whatever we want, it can be Unit for real production code, and it can be (UserData, EnrichedUserData) for test code. And the best part of it is that UberService doesn't care, it's ignorant of the concrete choice of A:

class UberService[Bookkept]( // 1
  fetcher: Fetcher,
  enricher: Enricher,
  bookkeeper: Bookkeeper[Bookkept], // 2
  storage: Storage):

  def fetchAndStore(user: UserID): Bookkept = // 3
    val data = fetcher.fetch(user)
    val enriched = enricher.enrich(user, data)

    val bookkept = bookkeeper.bookkeep(data, enriched)

    storage.store(enriched)

    bookkept // 4

This code is very similar to the first attempt of returning the Bookkept value. Unlike last time though, Bookkept is now a type-parameter (1), so UberService doesn't "know" what it's going to be, and doesn't care. The only thing we know is that the type parameter of the input Bookkeeper (2) is going to be consistent with the output type of fetchAndStore (3). All it does is pass it along.

Because the code is now completely generic (or ignorant) we can accommodate both the real use-case (Unit) and the test use-case ((UserData, EnrichedUserData)), without further modifications to UberService.

To wit, here's what the mock for Bookkeeper will look like now:

object TestBookkeeper extends Bookkeeper[(UserData, EnrichedUserData)]:

  def bookkeep(original: UserData, enriched: EnrichedUserData): (UserData, EnrichedUserData) =
    (original, enriched)

No more mutable state, we choose the type we need for the test, and the resulting mock is basically a pure function⁸, just like the mocks for Fetcher and Enricher.

The test itself can easily check that Bookkeeper was called with the correct input:

"The uber-service" should:
  "fetch the user data, enrich it, and store the results" in:
    // ...

    val service = new UberService(TestFetcher, TestEnricher, TestBookkeeper, storage)

    // ...

    val bookkeeperResult = service.fetchAndStore(UserID(5)) // 1

    bookkeeperResult shouldBe (expectedUserData, expectedEnriched) // 2

    // ...

We call fetchAndStore as before (1). But instead of Unit we have an actual result. Since we chose the type parameter of Bookkeeper to be (UserData, EnrichedUserData), this is exactly the data we had before in our mock, and we can assert on it as before (2).

What's more, it's not possible for UberService to "lie" to us with that Bookkeeper output. From the point of view of UberService there's no way to produce a Bookkept value without calling Bookkeeper, unless you're cheating⁹, parametric values cannot be faked into existence¹⁰.

While our test is becoming more civilized, the production code knows nothing about it. The production implementation can remain the same by setting A = Unit:

class ProductionBookkeeper(/* ... */) extends Bookkeeper[Unit]

Rinse and Repeat

We can repeat the same trick with Storage.

Add a type-parameter:

trait Storage[A]:
  def store(data: EnrichedUserData): A

Propagate the new type through UberService:

class UberService[Bookkept, Stored](
    fetcher: Fetcher,
    enricher: Enricher,
    bookkeeper: Bookkeeper[Bookkept],
    storage: Storage[Stored]):

  def fetchAndStore(user: UserID): (Bookkept, Stored) =
    val data = fetcher.fetch(user)
    val enriched = enricher.enrich(user, data)

    val bookkept = bookkeeper.bookkeep(data, enriched)
    val stored = storage.store(enriched)

    (bookkept, stored)

Create a mock that's a pure function:

object TestStorage extends Storage[EnrichedUserData]:
  def store(data: EnrichedUserData): EnrichedUserData = data

And use it in our test:

"The uber-service" should:
  "fetch the user data, enrich it, and store the results" in:
    val service = new UberService(TestFetcher, TestEnricher, TestBookkeeper, TestStorage)

    val expectedUserData = UserData(s"data: 5")
    val expectedEnriched = EnrichedUserData("enriched: 5 - data: 5")

    val (bookkeeperResult, storageResult) = service.fetchAndStore(UserID(5))

    bookkeeperResult shouldBe (expectedUserData, expectedEnriched)
    storageResult shouldBe expectedEnriched

Look at this lovely test! It's now completely pure. No more side-effects or mutable state. We are testing using pure-functions. The best kinds of test.

This test has been officially purified.

Complications?

The eagle-eyed readers amongst you might've noticed the potential for proliferation of type parameters in our code. For every Unit-returning method we now have to create a fake type-parameter. Is it worth it?

Apart from the benefit of (much) nicer tests, I would argue that the resulting type-signature is actually more helpful:

def fetchAndStore(user: UserID): Unit
// vs.
def fetchAndStore(user: UserID): (Bookkept, Stored)

The first signature describes an opaque side-effect, while the second signatures is more declarative, stating that the result of fetchAndStore is something being bookkept and stored. I think that's a win.

In the next part we'll see more advantages of being declarative this way in detail.

But I'm Already Doing Pure FP

You might also object that you're already doing pure FP with ZIO/Cats Effect/Finally Tagless. And that's great. But every time you see something like IO[Unit], the same considerations as above apply. You'll need to test a side-effect, and resort to some sort of mutability¹¹.

Unless you're doing Finally Tagless in which case in your tests you can replace F[Unit] with something like Writer[Unit] and still have a pure test. Though I still think that adding a type-parameter might be worthwhile.

In any case, the general lesson is still:

Friends, don't let friends use Unit as a return value.

Conclusion

We've seen how we can mechanically turn our side-effecting, mutable tests into nice pure functions, just by adding some type-parameters. The results are nicer, more declarative tests.

In the next part, in the spirit of "one 'simple' design change, a panoply of outcomes", we are going to take a look at some other benefits of parameterizing our code this way.

Happy purifying!

If you enjoyed this, reach out for a workshop on Functional Programming, where I teach how to apply Functional Programming to improve code quality in any language.

Permalink Twitter Reddit LinkedIn Hacker News Facebook WhatsApp

✓ Link copied to clipboard!

Gasp... ↩
The full code for the examples is available on GitHub. ↩
You could also use a mocking library for this purpose, but I personally prefer to avoid them. ↩
Don't catch me on the terminology here. Mocks, spies, stunt doubles, whatever kids these days call them, for our purposes here they are all the same. ↩
Double gasp... ↩
Okay, using a mocking library would make this less tedious. It can still be error-prone. And in any case I have no incentive to make mocking easy, I want people to feel the pain of mocking, so that they gravitate to code that doesn't rely on it that much. Arguably, such code would be of better quality compared to code that requires many mocks to test. ↩
I mean this in the Orwellian sense of "ignorance is strength". And see this talk for more. ↩
Equivalent to the identity function. ↩
By "cheating" I mean things like subverting the type-system with nulls, casts, and the like. See the Scalazzi subset of Scala. ↩
This is a loose application of "parametricity". ↩
The same reasoning applies to non-pure container types, like Future. ↩

# Purify Your Tests

Testing Side-Effects

The Problem

The Non-Solutions

The Solution

Rinse and Repeat

Complications?

But I'm Already Doing Pure FP

Conclusion

# Purify Your Tests

# Purify Your Tests: 2 Parametric, 2 Declarative

# Purify Your Tests III: Lean, Mean Testing Machine

# Purify Your Tests Episode IV: The Monoids Strike Back

# Purify Your Tests

Testing Side-Effects

The Problem

The Non-Solutions

The Solution

Rinse and Repeat

Complications?

But I'm Already Doing Pure FP

Conclusion

Footnotes

# Purify Your Tests

# Purify Your Tests: 2 Parametric, 2 Declarative

# Purify Your Tests III: Lean, Mean Testing Machine

# Purify Your Tests Episode IV: The Monoids Strike Back