Here's a fun little trick to purify your tests. When I say "purify" I mean it in the pure functional programming sense of making the tests side-effect-free. But more specifically, I'm going to show you a way to avoid mutable mocks.
Testing Side-Effects
To make this concrete, we're going to use an example. Although this is Scala, the technique I'm about to describe is applicable in any statically-typed language with generics/parametric polymorphism, Java1 to Haskell included.
So suppose you're writing your very cool UberService
2
class UberService(
fetcher: Fetcher,
enricher: Enricher,
bookkeeper: Bookkeeper,
storage: Storage):
def fetchAndStore(user: UserID): Unit = // 1
val data = fetcher.fetch(user) // 2
val enriched = enricher.enrich(user, data) // 3
bookkeeper.bookkeep(data, enriched) // 4
storage.store(enriched) // 5
UberService
is the pinnacle of engineering:
- It has one method
fetchAndStore
(1). - As the name states, it does some data-fetching (2).
- It then enriches the data (3).
- Since this is an important process we do some bookkeeping (4) about the fetched/enriched data.
- Lastly, we store the enriched result into some storage (5).
And because this is the pinnacle of engineering, we want UberService
to be testable. To that end it doesn't actually do anything on its own, instead it only orchestrates the Fetcher
, Enricher
, Bookkeeper
, and Storage
instances. Since they are all abstract traits, we can mock them out in tests, and verify that UberService
actually does what's written on the tin.
First we need to set up some mocks3 4.
Here are the mocks for the Fetcher
and Enricher
traits. These are easy, they are just pure functions, on every input they return a deterministic output:
object TestFetcher extends Fetcher:
def fetch(user: UserID): UserData = UserData(s"data: ${user.value}")
object TestEnricher extends Enricher:
def enrich(user: UserID, data: UserData): EnrichedUserData =
EnrichedUserData(s"enriched: ${user.value} - ${data.value}")
Next we need to mock the Bookkeeper
trait. This one's trickier, since its functionality is a pure side-effect, all we "return" from bookkeep
is Unit
, which doesn't provide us any new information within the context of the test.
class TestBookkeeper extends Bookkeeper:
var bookkept: List[(UserData, EnrichedUserData)] = List.empty // 1
def bookkeep(original: UserData, enriched: EnrichedUserData): Unit =
bookkept ::= ((original, enriched)) // 2
As a result, to be able to test proper usage of Bookkeeper
we have to use some mutable state:
- In (1) we initialize a
var
5 to hold the data that will be passed byUberService
to our mock. - In (2) we add the data that was passed to the
bookkeep
method.
And similarly for Storage
:
class TestStorage extends Storage:
var stored: List[EnrichedUserData] = List.empty
def store(data: EnrichedUserData): Unit =
stored ::= data
Enough mocks, let's get to business. Now we can write a test for UberService
:
"The uber-service" should:
"fetch the user data, enrich it, and store the results" in:
// 1
val bookkeeper = new TestBookkeeper
val storage = new TestStorage
val service = new UberService(TestFetcher, TestEnricher, bookkeeper, storage) // 2
val expectedUserData = UserData(s"data: 5")
val expectedEnriched = EnrichedUserData("enriched: 5 - data: 5")
service.fetchAndStore(UserID(5)) // 3
// 4
bookkeeper.bookkept shouldBe List((expectedUserData, expectedEnriched))
storage.stored shouldBe List(expectedEnriched)
The test is pretty-straightforward:
- We initialize some mocks (1).
- Create a new a
UberService
instance (2) with these mocks. - Then execute the
fetchAndStore
method (3), which is the thing actually being tested. - And we assert the contents of our mocks, making sure they got the right data passed into them.
By doing all that we verified that UberService
indeed performs the orchestration expected of it.
But this kind of sucks...
The Problem
The Bookkeeper
and Storage
mocks are annoying. Writing mutable mocks is tedious and error-prone6. Luckily for us UberService
is completely synchronous, but what would happen to that poor var
if we add some concurrency to the game? Not to mention race conditions and the like.
Needing to inspect the innards of our mocks to be able to write the test is a definite code smell.
To contrast, the mocks for Fetcher
and Enricher
are much simpler, they can be modeled by pure functions, given some input they produce some output. We do not have to deal with mutable state to see that they are used properly by UberService
. Their outputs can be used as indication that the functionality of Fetcher
and Enricher
was employed correctly.
It all boils down to a single source of problems: we are trying to test side-effects. Since side-effects are not observable without other side-effects, we have to resort to using mutation to catch them.
But what can we do, Bookkeeper
and Storage
are side-effectful components. Do we have any other choice?
The Non-Solutions
What makes Fetcher
and Enricher
nice to work with is that they produce outputs when called, while Bookkeeper
and Storage
only return Unit
. Well, what about adding some outputs to them. But what should we return? Here's one attempt:
trait Bookkeeper:
def bookkeep(original: UserData, enriched: EnrichedUserData): Bookkept // 1
sealed trait Bookkept
object Bookkept extends Bookkept // 2
In this new edition of Bookkeeper
instead of returning Unit
we are returning a new type called Bookkept
(1). Which is defined as a singleton (2).
Great, we now have an output, just like Fetcher
. Can we get rid of the unpleasant mocks? Nope.
Since Bookkept
is a singleton with no information beyond its existence, that is to say, isomorphic to Unit
, we cannot use it to glean any information on how Bookkeeper
was invoked. This is a dead-end.
Okay, so let's add some information to the output:
trait Bookkeeper:
def bookkeep(original: UserData, enriched: EnrichedUserData): (UserData, EnrichedUserData) // 1
class UberService(/* same as before */):
def fetchAndStore(user: UserID): Bookkept =
// same as before
val bookkept = bookkeeper.bookkept(data, enriched) // 2
storage.store(enriched)
bookkept // 3
In this version of Bookkeeper
we are returning the data that we need for the test (1), a tuple of (UserData, EnrichedUserData)
. This should give us everything we need. To be able to use that data, we have to hold on to it in fetchAndStore
(2), and then return it as the result of fetchAndStore
(3).
This works, but feels kind of iffy. Why should the real production code know about the specific data that is needed for the test? And what if we have different kinds of tests that care about different pieces of data? Do we have to accommodate them all in our real code?
The resulting code is both too rigid and concerns itself with something it shouldn't care about. How can we make it both more flexible and more ignorant 7?
The Solution
When some code requires both flexibility and ignorance it is very likely that the solution is to add a type parameter. Let me demonstrate:
trait Bookkeeper[A]: // 1
def bookkeep(original: UserData, enriched: EnrichedUserData): A // 2
Now Bookkeeper
takes a type-parameter (1), and returns a value of that type as a result. This gives us maximum flexibility. In concrete implementations we can choose A
to be whatever we want, it can be Unit
for real production code, and it can be (UserData, EnrichedUserData)
for test code. And the best part of it is that UberService
doesn't care, it's ignorant of the concrete choice of A
:
class UberService[Bookkept]( // 1
fetcher: Fetcher,
enricher: Enricher,
bookkeeper: Bookkeeper[Bookkept], // 2
storage: Storage):
def fetchAndStore(user: UserID): Bookkept = // 3
val data = fetcher.fetch(user)
val enriched = enricher.enrich(user, data)
val bookkept = bookkeeper.bookkeep(data, enriched)
storage.store(enriched)
bookkept // 4
This code is very similar to the first attempt of returning the Bookkept
value. Unlike last time though, Bookkept
is now a type-parameter (1), so UberService
doesn't "know" what it's going to be, and doesn't care. The only thing we know is that the type parameter of the input Bookkeeper
(2) is going to be consistent with the output type of fetchAndStore
(3). All it does is pass it along.
Because the code is now completely generic (or ignorant) we can accommodate both the real use-case (Unit
) and the test use-case ((UserData, EnrichedUserData)
), without further modifications to UberService
.
To wit, here's what the mock for Bookkeeper
will look like now:
object TestBookkeeper extends Bookkeeper[(UserData, EnrichedUserData)]:
def bookkeep(original: UserData, enriched: EnrichedUserData): (UserData, EnrichedUserData) =
(original, enriched)
No more mutable state, we choose the type we need for the test, and the resulting mock is basically a pure function 8, just like the mocks for Fetcher
and Enricher
.
The test itself can easily check that Bookkeeper
was called with the correct input:
"The uber-service" should:
"fetch the user data, enrich it, and store the results" in:
// ...
val service = new UberService(TestFetcher, TestEnricher, TestBookkeeper, storage)
// ...
val bookkeeperResult = service.fetchAndStore(UserID(5)) // 1
bookkeeperResult shouldBe (expectedUserData, expectedEnriched) // 2
// ...
We call fetchAndStore
as before (1). But instead of Unit
we have an actual result. Since we chose the type parameter of Bookkeeper
to be (UserData, EnrichedUserData)
, this is exactly the data we had before in our mock, and we can assert on it as before (2).
What's more, it's not possible for UberService
to "lie" to us with that Bookkeeper
output. From the point of view of UberService
there's no way to produce a Bookkept
value without calling Bookkeeper
, unless you're cheating10, parametric values cannot be faked into existence11.
While our test is becoming more civilized, the production code knows nothing about it. The production implementation can remain the same by setting A = Unit
:
class ProductionBookkeeper(/* ... */) extends Bookkeeper[Unit]
Rinse and Repeat
We can repeat the same trick with Storage
.
Add a type-parameter:
trait Storage[A]:
def store(data: EnrichedUserData): A
Propagate the new type through UberService
:
class UberService[Bookkept, Stored](
fetcher: Fetcher,
enricher: Enricher,
bookkeeper: Bookkeeper[Bookkept],
storage: Storage[Stored]):
def fetchAndStore(user: UserID): (Bookkept, Stored) =
val data = fetcher.fetch(user)
val enriched = enricher.enrich(user, data)
val bookkept = bookkeeper.bookkeep(data, enriched)
val stored = storage.store(enriched)
(bookkept, stored)
Create a mock that's a pure function:
object TestStorage extends Storage[EnrichedUserData]:
def store(data: EnrichedUserData): EnrichedUserData = data
And use it in our test:
"The uber-service" should:
"fetch the user data, enrich it, and store the results" in:
val service = new UberService(TestFetcher, TestEnricher, TestBookkeeper, TestStorage)
val expectedUserData = UserData(s"data: 5")
val expectedEnriched = EnrichedUserData("enriched: 5 - data: 5")
val (bookkeeperResult, storageResult) = service.fetchAndStore(UserID(5))
bookkeeperResult shouldBe (expectedUserData, expectedEnriched)
storageResult shouldBe expectedEnriched
Look at this lovely test! It's now completely pure. No more side-effects or mutable state. We are testing using pure-functions. The best kinds of test.
This test has been officially purified.
Complications?
The eagle-eyed readers amongst you might've noticed the potential for proliferation of type parameters in our code. For every Unit
-returning method we now have to create a fake type-parameter. Is it worth it?
Apart from the benefit of (much) nicer tests, I would argue that the resulting type-signature is actually more helpful:
def fetchAndStore(user: UserID): Unit
// vs.
def fetchAndStore(user: UserID): (Bookkept, Stored)
The first signature describes an opaque side-effect, while the second signatures is more declarative, stating that the result of fetchAndStore
is something being bookkept and stored. I think that's a win.
In the next part we'll see more advantages of being declarative this way in detail.
But I'm Already Doing Pure FP
You might also object that you're already doing pure FP with ZIO/Cats Effect/Finally Tagless. And that's great. But every time you see something like IO[Unit]
, the same considerations as above apply. You'll need to test a side-effect, and resort to some sort of mutability9.
Unless you're doing Finally Tagless in which case in your tests you can replace F[Unit]
with something like Writer[Unit]
and still have a pure test. Though I still think that adding a type-parameter might be worthwhile.
In any case, the general lesson is still:
Friends, don't let friends use
Unit
as a return value.
Conclusion
We've seen how we can mechanically turn our side-effecting, mutable tests into nice pure functions, just by adding some type-parameters. The results are nicer, more declarative tests.
In the next part, in the spirit of "one 'simple' design change, a panoply of outcomes", we are going to take a look at some other benefits of parameterizing our code this way.
Happy purifying!
- Gasp...↩
- The full code for the examples is available on GitHub.↩
- You could also use a mocking library for this purpose, but I personally prefer to avoid them.↩
- Don't catch me on the terminology here. Mocks, spies, stunt doubles, whatever kids these days call them, for our purposes here they are all the same.↩
- Double gasp...↩
- Okay, using a mocking library would make this less tedious. It can still be error-prone. And in any case I have no incentive to make mocking easy, I want people to feel the pain of mocking, so that they gravitate to code that doesn't rely on it that much. Arguably, such code would be of better quality compared to code that requires many mocks to test.↩
- I mean this in the Orwellian sense of "ignorance is strength". And see this talk for more.↩.
- Equivalent to the identity function.↩
- The same reasoning applies to non-pure container types, like
Future
.↩ - By "cheating" I mean things like subverting the type-system with
null
s, casts, and the like. See the Scalazzi subset of Scala.↩ - This is a loose application of "parametricity".↩