# Random Scala Tip #624: The Pitfalls of Option Blindness
Given this class:
case class User( id: UserId, email: Option[Email], address: Option[Address], posts: Option[List[Post]], lastLogin: Option[Timestamp])
What does Option
on each of the fields actually means?
Well, after spending hours digging in the code, documentation1, and use sites, I gleaned the following:
email
is actually mandatory, but that requirement was only added recently, so it was made optional for backwards compatibility with older clientsaddress
is taken from user input, and it's not mandatory and can be missingposts
is fetched from a database, but some times the database is down, so we fallback toNone
, which is distinct fromSome(Nil)
, and should be handled appropriatelylastLogin
will be empty if the user never logged in previously, we need to provide a special greeting when a user logs in for the first time
We have four distinct meanings for Option
, but only one data type to represent them all. This phenomenon is called "Option
blindness".
Differently Typed Blindnesses
Boolean blindness is a well known phenomenon. It has a somewhat lesser known cousin called algebraic blindness. That's when we use very generic algebraic data types, like Option
, Either
, and These/Ior
2 to represent domain-specific data, and in doing so we miss that domain-specific context in our code.
Here I'm picking specifically on Option
because it's so common. As it's often touted as a solution to the "null problem", it easy to think about it as an "never wrong" solution, even more so if you're new to functional programming. On top of that, Option
enjoys a privileged status in the wider Scala ecosystem. Especially in various serialization libraries, where it commonly gets special handling, making it all the more tempting to use it all over the place.
The Tip
Before wrapping anything in
Option
, consider whether you should use a more meaningful domain-specific data type instead.
Examples
There are plenty of domain-specific meanings that an Option
can take, but I'll use the four examples above. Hopefully this will illustrate the general principle well enough.
Backwards Compatibility
This one is particularly annoying, as it tends to get more widespread as the system evolves over time. You add a new field to a class that's constrained by backwards compatibility (e.g., it is used for communication, or stored in a database), and then you mark it Option
al for perpetuity. Time passes by, and you no longer have any idea whether the field can be missing "for real", or it's just some vestigial Option
supporting a no longer relevant version of code.
Instead, I would suggest to use a custom generic type, isomorphic to Option
, to communicate the backwards compatibility concern directly3:
enum BackCompat[+A]: case Present(value: A) case Missing
How you end up treating such BackCompat
values is very domain-dependent, maybe you can replace them with A
over time, maybe not. But in any case, at least now you know why they are there and treat them accordingly.
Missing User Inputs
For user input that is not mandatory I'm on the fence, this might be the case where just using Option
is the right fit, with all the convenience that you get from using a standard data type. The data is there or not, and that's all you care about. On the other hand, knowing that the data came from user input, and that it was explicitly not provided, might be relevant to your domain, in which case a custom Option
-like data type (see above) may be in order.
This dilemma highlights the point that choosing the correct way to model your data is to an extent an art rather than just science.
Feature Toggle
The next example is using Option
to indirectly communicate that a certain feature in the system (data fetching in the example) was turned off. That means that we have to remember that None
has a special meaning. It's not just an empty value, but instead it's empty because a feature was turned off, and we might want to apply special handling to it. Using the terminology from the boolean blindness post, the provenance of the Option
is just as important as the value of the data itself.
Instead of using Option
we can communicate the provenance with yet another domain-specific, Option
-like data type:
enum FetchedData[+A]: case Available(value: A) case Disabled
Now, every time someone has a FetchedData
value, they have to decide what to do with it based on the informatively named cases. No longer do we have to remember the special meaning of None
, the type system will guide us in the right direction.
One day, if and when we'll have more data fetching toggles, it'll be easy to add them to the custom FetchedData
type, and then let the compiler guide us with fixing up the code4. Piggybacking on Option
for this might get cumbersome.
System State
The last example is when Option
is used to represent different states of the system. Now None
means that the user never logged in. Information that we have to hold in our heads rather than in the type system5.
Unsurprisingly, the solution would be to create a domain-specific data type. But to add some variety, for this case I will go with a non-generic type. The "last logged in" state doesn't have a meaningful value that's not a Timestamp
(at least in my interpretation of this example).
The state machine for this part of the system can be represented as:
enum LastLogin: case At(value: Timestamp) case Never
With this little state machine the compiler itself will remind us of whatever special behavior that we need to apply when a user logs in for the first time.
Trade Offs
Like most things in life, using custom data types is a trade off. Even if we ignore the boilerplate of actually defining a new type every time, we still pay in at least two ways:
- We lose all the standard
Option
functions - We cannot automatically participate in whatever special treatment that
Option
gets in various libraries
We can try to mitigate the first point by defining conversions to/from Option
and then use those to gain back some of the Option
functionality6. Even better, if you're into functional programming you can use some standard typeclasses to imbue our custom types with Option
-like functionality7. Some libraries will even let you derive those typeclasses automatically.
For the second point, special Option
handling is very common in serialization libraries, e.g., converting Option
to null
in JSON. If you're lucky, whatever special functionality Option
gets is guided by implicits. In which case custom data types can participate just as well (at the price of some more boilerplate). If you're not that lucky, and Option
handling is hardcoded into the library, then you can try to petition the library authors to change that. Feel free to use this post as a form of justification.
Despite these cons, I still think that the maintainability gains we get from domain-specific, Option
-like data types often outweigh the downsides.
May you never be blinded by Option
ever again, till next time!
Footnotes
-
Obviously unmaintained and misleading. ↩
-
I'm using Scala 3 syntax which is pleasantly compact. The same applies to Scala 2 albeit with noisier syntax. ↩
-
Assuming that you have pattern match exhaustivity errors turned on, and if not, go now and enable them. What are you still doing here? Go! ↩
-
Going down the route of moving system states into the type system is a good first step towards "making illegal states unrepresentable". ↩
-
If you're feeling particularly adventurous you can make those conversions implicit. ↩