Types as Sets
One of the most important techniques in Elm programming is to make the possible values in code exactly match the valid values in real life. This leaves no room for invalid data, and this is why I always encourage folks to focus on custom types and data structures.
In pursuit of this goal, I have found it helpful to understand the relationship between types and sets. It sounds like a stretch, but it really helps develop your mindset!
You can think of types as a set of values.
Bool
is the set{ True, False }
Color
is the set{ Red, Yellow, Green }
Int
is the set{ ... -2, -1, 0, 1, 2 ... }
Float
is the set{ ... 0.9, 0.99, 0.999 ... 1.0 ... }
String
is the set{ "", "a", "aa", "aaa" ... "hello" ... }
So when you say x : Bool
it is like saying x
is in the { True, False }
set.
Cardinality
Some interesting things happen when you start figuring out how many values are in these sets. For example the Bool
set { True, False }
contains two values. So math people would say that Bool
has a of two. So conceptually:
- cardinality(
Bool
) = 2 - cardinality(
Color
) = 3 - cardinality(
Int
) = ∞ - cardinality(
Float
) = ∞
This gets more interesting when we start thinking about types like (Bool, Bool)
that combine sets together.
When you combine types with tuples, the cardinalities get multiplied:
- cardinality(
(Bool, Bool)
) = cardinality(Bool
) × cardinality(Bool
) = 2 × 2 = 4 - cardinality(
(Bool, Color)
) = cardinality(Bool
) × cardinality(Color
) = 2 × 3 = 6
To make sure you believe this, try listing all the possible values of (Bool, Bool)
and . Do they match the numbers we predicted? How about for (Color, Color)
?
- cardinality(
(Bool, String)
) = 2 × ∞ - cardinality(
(Int, Int)
) = ∞ × ∞
I personally really like the idea of having two infinities. One wasn’t enough? And then seeing infinite infinities. Aren’t we going to run out at some point?
Addition (Custom Types)
When figuring out the cardinality of a custom type, you add together the cardinality of each variant. Let’s start by looking at some Maybe
and Result
types:
- cardinality(
Result Bool Color
) = cardinality(Bool
) + cardinality(Color
) = 2 + 3 = 5 - cardinality(
Maybe Bool
) = 1 + cardinality(Bool
) = 1 + 2 = 3 - cardinality(
Maybe Int
) = 1 + cardinality(Int
) = 1 + ∞
To persuade yourself that this is true, try listing out all the possible values in the Maybe Bool
and Result Bool Color
sets. Does it match the numbers we got?
Here are some other examples:
type Height
= Inches Int
| Meters Float
-- cardinality(Height)
-- = cardinality(Int) + cardinality(Float)
type Location
= Nowhere
-- cardinality(Location)
-- = 1 + cardinality((Float, Float))
-- = 1 + cardinality(Float) × cardinality(Float)
-- = 1 + ∞ × ∞
Looking at custom types this way helps us see when two types are equivalent. For example, Location
is equivalent to Maybe (Float, Float)
. Once you know that, which one should you use? I prefer Location
for two reasons:
- The code becomes more self-documenting. No need to wonder if
Just (1.6, 1.8)
is a location or a pair of heights. - The
Maybe
module may expose functions that do not make sense for my particular data. For example, combining two locations probably should not work likeMaybe.map2
. Should oneNowhere
mean that everything isNowhere
? Seems weird!
In other words, I write a couple lines of code that are similar to other code, but it gives me a level of clarity and control that is extremely valuable for large code bases and teams.
Thinking of “types as sets” helps explain an important class of bugs: invalid data. For example, say we want to represent the color of a traffic light. The set of valid values are { red, yellow, green } but how do we represent that in code? Here are three different approaches:
type alias Color = { red : Bool, yellow : Bool, green : Bool }
— The idea here is that the idea of “red” is represented byColor True False False
. But what aboutColor True True True
? What does it mean for it to be all the colors at once? This is invalid data. Just like with theString
representation, we end up writing checks in our code and tests to make sure there are no mistakes. In this case, cardinality(Color
) = 2 × 2 × 2 = 8, so there are only 5 invalid values. There are definitely fewer ways to mess up, but we should still have some checks and tests.type Color = Red | Yellow | Green
— In this case, invalid data is impossible. cardinality(Color
) = 1 + 1 + 1 = 3, exactly corresponding to the set of three values in real life. So there is no point checking for invalid color data in our code or tests. It cannot exist!
So the whole point here is that ruling out invalid data makes your code shorter, simpler, and more reliable. By making sure the set of possible values in code exactly matches the set of valid values in real life, many problems just go away. This is a sharp knife!
As your program changes, the set of possible values in code may start to diverge from the set of valid values in real life. I highly recommend revisiting your types periodically to make them match again. This is like noticing your knife has become dull and sharpening it with a whetstone. This kind of maintenance is a core part of programming in Elm.
When you start thinking this way, you end up needing fewer tests, yet having more reliable code. You start using fewer dependencies, yet accomplishing things more quickly. Similarly, someone skilled with a knife probably will not buy a . There is definitely a place for blenders and food processors, but it is smaller than you might think. No one runs ads about how you can be independent and self-sufficient without any serious downsides. No money in that!