Types as Bits
Bool
Int
(Int, Int)
Maybe Int
- …
We have a conceptual understanding of them by now, but how are they understood by a computer? How is Maybe Int
stored on a hard disk?
A bit is little box that has two states. Zero or a one. On or off. Computer memory is one super long sequence of bits.
Okay, so all we have is a bunch of bits. Now we need to represent everything with that!
Bool
A Bool
value can be either True
or False
. This corresponds exactly to a bit!
An Int
value is some whole number like 0
, 1
, 2
, etc. You cannot fit that in a single bit, so the only other option is to use multiple bits. So normally, an Int
would be a sequence of bits, like these:
We can arbitrarily assign meaning to each of these sequences. So maybe 00000000
is zero and is one. Great! We can just start assigning numbers to bit sequences in ascending order. But eventually we will run out of bits…
By some quick math, eight bits only allow (2^8 = 256) numbers. What about perfectly reasonable numbers like 9000 and 8004?
String
The string "abc"
is the sequence of characters a
b
c
, so we will start by trying to represent characters as bits.
One of the early ways of encoding characters is called ASCII. Just like with integers, they decided to list out a bunch of bit sequences and start assigning values arbitrarily:
So every character needed to fit in eight bits, meaning only 256 characters could be represented! But if you only care about English, this actually works out pretty well. You need to cover 26 lower-case letters, 26 upper-case letters, and 10 numbers. That is 62. There is a bunch of room left for symbols and other weird stuff. You can see what they ended up with .
We have an idea for characters now, but how will the computer know where the String
ends and the next piece of data begins? It is all just bits. Characters look just like Int
values really! So we need some way to mark the end.
These days, languages tend to do this by storing the length of the string. So a string like "hello"
might look something like 5
h
e
l
l
o
in memory. So you know a String
always starts with 32-bits representing the length. And whether the length is 0 or 9000, you know exactly where the characters end.
Custom Types
A custom type is all about combining different types. Those different types may have all sorts of different shapes. We will start with the Color
type:
We can assign each case a number: Red = 0
, , and Green = 2
. Now we can use the Int
representation. Here we only need two bits to cover all the possible cases, so 00
is red, 01
is yellow, 10
is green, and 11
is unused.
But what about custom types that hold additional data? Like Maybe Int
? The typical approach is to set aside some bits to “tag” the data, so we can decide that Nothing = 0
and Just = 1
. Here are some examples:
Nothing
=0
Just 16
=1
00010000
A case
expression always looks at that “tag” before deciding what to do next. If it sees a 0
it knows there is no more data. If it sees a 1
it knows it is followed by a sequence of bits representing the data.
This “tag” idea is similar to putting the length at the beginning of values. The values may be different sizes, but the code can always figure out where they start and end.
Eventually, all values need to be represented in bits. This page gives a rough overview of how that actually works.
Normally there is no real reason to think about this, but I found it to be helpful in deepening my understanding of custom types and case
expressions. I hope it is helpful to you as well!