Preface
This document is uniquely tailored to those who need to read, rather than write Haskell code: auditors, scientists, managers, testers etc.
This excerpt of Haskell for Readers focuses on reading Haskell's types, without getting bogged down in the details and complexities of the rest of the Haskell code. By understanding the types, we can understand much of the architecture of a Haskell program.
Form
This document is not (necessarily) a self-contained tutorial; it is rather the base for an interactive lecture, given by a real instructor. In such a lecture, some holes will be filled as we go, and the questions from the audience form a crucial part of the learning experience.
This document is also meant to be more on the concise side, assuming the audience is shorter on time than on wits, and in a small, live workshop, the lecturer can add details, come up with more examples and slow down as needed.
This makes these notes less ideal for independent study, but that said, it should be possible to work attentively through them and still learn a lot.
Audience
I expect the audience to be familiar with programming and computer science in general, but do not assume prior knowledge of functional programming (or, in case you are worried about this, category theory).
Acknowledgments and license
The creation of this material was sponsored by the DFINITY Foundation, and is shared with the public under the terms of the Creative Commons Attribution 4.0 International License. You can view the source on GitHub of this document, and submit improvements there.
Reading Haskell modules
In Haskell, every file is a Haskell module, and modules are used to organize and namespace the code.
Normally, a Haskell module named Foo.Bar.Baz
lives in a file Foo/Bar/Baz.hs
, and begins with
module Foo.Bar.Baz where
Haskell module names are always capitalized.
The rest of the module consists of declarations: Values, functions, types, type synonyms etc. The order of declarations in a module is completely irrelevant; things do not need to be declared before they are used. This allows the author to sort functions by topic, or by relevance, rather than by dependency, and it is not uncommon to first show the main entry-point of a module, and put all the helper functions it uses below.
A good starting point when reading a module is to look at what types and functions are exported from the module. Many modules list their exports in the module definition.
module Foo.Bar.Baz (Baz, parseBaz, formatBaz, promptBaz, amazingBaz) where
When there's no explicit export list, a module exports all of its top-level declarations.
So, what is a top-level declaration in Haskell? Well, Haskell is a language with significant indentation (similar to Python), and top-level declarations are the ones that are not indented at all.
Try to scan over the example of a Haskell module below and see if you can find its top-level declarations. Don't get hung up on trying to understand the syntax, or what the code does, just look at the shape of the code and pick out the names of the things this module exports.
module Foo.Bar.Baz where
import Data.Maybe
import Control.Applicative
data Baz = Baz Integer
deriving (Eq, Ord)
promptBaz :: IO Baz
promptBaz = do
putStr "Input desired Baz value, from 1-100: "
response <- getLine
case parseBaz response of
Nothing -> invalidinput "Sorry, but you need to enter a number."
Just b
| b < Baz 1 -> invalidinput "Baz value can't be smaller than 1"
| b > Baz 100 -> invalieinput "Baz value can't be larger than 100"
| b == amazingBaz -> do
putStrLn "That's an amazing Baz!"
return b
| otherwise -> return b
where
invalidinput reason = do
putStrLn reason
promptBaz -- loop
parseBaz :: String -> Maybe Baz
parseBaz s = Baz <$> readMaybe s
formatBaz :: Baz -> String
formatBaz (Baz n) = show n
amazingBaz :: Baz
amazingBaz = Baz 42
If you guessed that the module exports Baz
, promptBaz
, parseBaz
,
formatBaz
, and amazingBaz
, you were right! Those are all of the
top-level declarations in the module. The important lines for our purposes
are these:
data Baz = Baz Integer
promptBaz :: IO Baz
parseBaz :: String -> Maybe Baz
formatBaz :: Baz -> String
amazingBaz :: Baz
If you can read a Haskell module and pick out top-level declarations like those from the rest of the code, you're well on your way to being able to think about the interface provided by that module. To fully understand it, we need to talk about types.
Types
Haskell has a strong static type system, which is essentially a way for you to communicate with the compiler. You can ask the compiler “what do you know about this function? what can it take, what kind of things does it return?”. And you can tell the compiler “this function ought to take this and return that (and please tell me if you disagree)”.
In fact, many Haskellers prefer to do type-driven development: First think about and write down the types of the functions they need to create, and then think about implementing them.
Besides communicating with the compiler, types are also crucial in communicating with your fellow developers and/or users of your API. For many functions, the type alone, or the type and the name, is sufficient to tell you what it does.
Basic types
Each top-level value and function in a Haskell module has a type, and these types are almost always written down before the code.
amazingBaz :: Baz
amazingBaz = Baz 42
The first line is a type annotation, i.e. a term (amazingBaz
) followed
by two colons, followed by its type (Baz
). So amazingBaz
is a
constant value of type Baz
.
More common than constant values are functions:
formatBaz :: Baz -> String
formatBaz (Baz n) = show n
The arrow in the type of formatBaz
tells us that it is a function.
It takes a Baz
as an argument and returns a String
as a result.
A function that takes two values will have a type with two arrows in it, for example:
increaseBaz :: Baz -> Integer -> Baz
increaseBaz (Baz a) b = Baz (a + b)
The types of functions can get much more complicated than that, with many arguments passed to it. And a function can take another function as an argument, making it a higher-order function.
Can you guess how many parameters this function takes from its type annotation?
genericPromptBaz :: (String -> Maybe Baz) -> Baz -> (Baz -> String) -> IO Baz
If you guessed three, you're right; it takes a function of type
String -> Maybe Baz
, a value of type Baz
and another function of type
Baz -> String
.
Such type annotations often get too long to fit on a single line, so bear in mind that this is another way to format the same type annotation:
genericPromptBaz
:: (String -> Maybe Baz)
-> Baz
-> (Baz -> String)
-> IO Baz
Polymorphism
So far we've seen functions that operate on fixed data types like String
and Baz
. The name of a data type always starts with a capital letter.
But sometimes you'll see a function whose type annotation contains
lower-case types.
genericPrompt
:: (String -> Maybe t)
-> t
-> (t -> String)
-> IO t
But what is this type t
? There is not, actually, a type called t
.
Instead, this is a type variable, meaning that the function genericPrompt
can be used with any type. Any lower-case identifier in a type is a type
variable (not just t
), and concrete types are always upper-case.
To use genericPrompt
with any particular type, t
gets filled in with
the type. It could be used with parameters based on Baz
or instead
with numbers, Booleans, or even with functions.
Bear in mind that type variables like t
can become any type, but it has the
be the same type everywhere in the type annotation.
Constrained types (a first glimpse)
Type annotations for polymorphic functions often include one more bit of syntax.
genericPrompt :: Eq t => (String -> Maybe t) -> t -> IO t
The part after the =>
is what we expect: The arguments of the function
and its return type.
The part before the =>
is new: It is a constraint, and it limits
which types t
can be instantiated with to those that can be compared for
equality.
This Eq
thing is not some built-in magic, but rather a type class,
another very powerful and important feature of Haskell, which we will dive
into separately later.
Algebraic data types
The function type is very expressive, and one can model many data structures purely with functions. But of course it is more convenient to use dedicated data structures. There are a number of data structure types that come with the standard library, in particular tuples, lists, the Maybe
type. But it is more instructive to first look at how we can define our own.
We can declare new data types using the data
keyword, the name of the type, the =
sign, and then a list of constructors, separated by pipes (|
). These declarations are a bit odd, since they use =
although it is not really an equality, but let us look at it step by step.
Enumerations
In the simplest case, we can use this to declare an enumeration type:
data Suit = Diamonds | Clubs | Hearts | Spades
From now on, we can use the constructors, e.g. Diamonds
as values of type Suit
. This is how we create values of type Suit
-- and it is the only way, so we know that every value of type Suit
is, indeed, one of these four constructors.
Note that Suit
is a type, i.e. something you can use in type signatures, while Diamonds
and the other constructors are values, i.e. something you can use in your function definitions.
Product types
So far, the constructors were just plain values. But we can also turn them into “containers” of sort, where we can store other values. As an basic example, maybe we want to introduce a type for complex numbers:
data Complex = C Integer Integer
(Mathematically educated readers please excuse the use of Integers
here.)
This creates a new type Complex
, with a constructor C
. But C
itself is not a value of type Complex
, but rather it is a function that creates values of type Complex
and, crucially, it is the only way of creating values of type Complex
. In the ghci
repl, we can ask for the type of C
and see that it is
indeed just a function:
Prelude> :type C
C :: Integer -> Integer -> Complex
Sum types
A type like Complex
, with exactly one constructor, is called a product type. But we can of course have types with more than one constructor and constructor arguments:
data Riemann = Complex Complex | Infinity
This declares a new type Riemann
that can be built using one of these two constructors:
- The constructor
Complex
, which takes one argument, of typeComplex
. Types and terms (including constructors) have different namespaces, so we can have a type calledComplex
, and a constructor calledComplex
, and they can be completely unrelated. This can be confusing, but is rather idiomatic. - The constructor
Infinity
takes no arguments, and simply is a value of typeRiemann
itself.
A data type that has more than one constructor is commonly called a sum type. Because data
allows you to build types from sums and products, these types are called algebraic data types (ADTs).
Recursive data types
It it worth pointing out that it is completely fine to have a constructor argument of the type that we are currently defining. This way, we obtain a recursive data type, and this is the foundation for many important data structures, in particular lists and trees of various sorts. Here is a simple example, a binary tree with numbers on all internal nodes:
data Tree = Leaf | Node Integer Tree Tree
This can be read as “a value of type Tree
is either a Leaf
, or it is a Node
that contains a value of type Integer
and references to two subtrees.”
Polymorphic data types
The tree data type declared in the previous section ought to be useful not just for integers, but maybe for any type. But it would be seriously annoying to have to create a new tree data type for each type we want to store in the tree. Therefore, we have polymorphic data types. In the example of the tree, we can write:
data Tree a = Leaf | Node a (Tree a) (Tree a)
The a
here is a type variable, because it occurs in the place of a type, but is lower-case. When we want to use this tree data type with a concrete type, say Integer
, we simply write Tree Integer
:
insert :: Integer -> Tree Integer -> Tree Integer
We can also write functions that work on polymorphic trees, i.e. which we can use on any Tree
, no matter what type the values in the nodes are. A good example is:
size :: Tree a -> Integer
size Leaf = 0
size (Node _ t1 t2) = 1 + size t1 + size t2
Functions in data types
Maybe this is obvious to you, after the emphasis on functions earlier, but it is still worth pointing out that data type can also store functions. This blurs the distinction between data and code some more, as this nice example shows:
data Stream a b
= NeedInput (a -> Stream a b)
| HasOutput b (Stream a b)
| Done
The type Stream a b
models a state machine that consumes values of type a
, produces values of type b
, and maybe eventually stops. Such a machine is in one of three states:
- Waiting for input. This uses the
NeedInput
constructor, which carries a function that consumes a type of valuea
and returns the new state of the machine. - Producing output. This uses the
HasOutput
constructor, which stores the output value of typeb
, and the subsequent state of the machine. Done
.
Predefined data types
We intentionally discussed the mechanisms of algebraic data types first, so that we can explain the most common data types in the standard library easily.
Booleans
The values True
and False
are simply the constructors of a data type defined as
data Bool = False | True
Maybe
A very common use case for algebraic data types is to capture the idea of a type whose values “maybe contain nothing, or just a value of type a
”. Because this is so common, such a data type is predefined:
data Maybe a = Nothing | Just a
You might see Maybe
, for example, in the return type of a function that deserializes a binary or textual representation of a type, for example:
parseBar :: String -> Maybe Bar
Such an operation can fail, and if the input is invalid, it would return Nothing
. As a user of such a function, the only way to get to the value inside the Maybe
is to pattern-match on the result, which forces you to think about and handle the case where the result is Nothing
.
This is much more robust than the common idiom in C, where you have to remember to check for particular error values (-1, or NULL
), or Go, where you get both a result and a separate error code, but you can still be lazy and use the result without checking the error code.
A big part of Haskell’s reputation as a language that makes it easier to write correct code relies on the use of data types to precisely describe the values you are dealing with.
Either
With maybe we can express “one or none”. Sometimes we want “one or another” type. For this, the standard library provides
data Either a b = Left a | Right b
Commonly, this type is used for computations that can fail, but that provide some useful error messages when they fail:
parseBar :: String -> Either ParseError Bar
This gives us the same robustness benefits of Maybe
, but also a more helpful error messages. If used in this way, then the Left
value is always used for the error or failure case, and the Right
value for when everything went all right.
Tuples
Imagine you are writing a function that wants to return two numbers -- say, the last digit and the rest of the number. The way to do that that you know so far would require defining a data type:
data TwoIntegers = TwoIntegers Integer Integer
splitLastDigit :: Integer -> TwoIntegers
splitLastDigit n = TwoIntegers (n `div` 10) (n `mod` 10)
Clearly, the concept of “passing around two values together“ is not particularly tied to Integer
, and we can use polymorphism to generalize this definition:
data Two a b = Two a b
splitLastDigit :: Integer -> Two Integer Integer
splitLastDigit n = Two (n `div` 10) (n `mod` 10)
And because this is so useful, Haskell comes with built-in support for such pairs, including a nice and slim syntax:
data (a,b) = (a,b)
splitLastDigit :: Integer -> (Integer, Integer)
splitLastDigit n = (n `div` 10, n `mod` 10)
Besides tuples, which store two values, there are triples, quadruples and, in general, n-tuples of any size you might encounter. But really, if these tuples get larger than two or three, the code starts to smell.
Their size is always fixed and statically known at compile time, and you can have values of different types as components of the tuple. This distinguishes them from the lists we will see shortly.
The unit type
There is also a zero-tuple, so to say: The unit type written ()
with only the value ()
:
data () = ()
While this does not look very useful yet, we will see that it plays a crucial role later when we encounter types such as IO ()
Lists
In the previous section we defined trees using a recursive data type. It should be obvious that we can define lists in a very analogous way:
data List a = Empty | Link a (List a)
This data structure is so ubiquitous in functional programming that it not only comes with the standard library, it also has very special, magic syntax:
data [a] = [] | a : [a]
In words: The type [a]
is the type of lists with values of type a
. Such a list is either the empty list, written as []
, or it is a non-empty list containing of a head x
of type a
, and a tail xs
, and is written as x:xs
. Note that the constructor (:)
, called “cons”, is using operator syntax.
Lists are very useful for many applications, but they are not a particularly high-performance data structure -- random access and concatenation is expensive, and they use quite a bit of memory. Depending on the application, other types like arrays/vectors, finger trees, difference lists, maps or sets might be more suitable.
Characters and strings
Haskell has built-in support for characters and text. A single character has type Char
, and is written in single quotes, e.g. 'a'
, '☃'
, '\''
, '\0'
, '\xcafe'
. These characters are Unicode code points, and not just 7 or 8 bit characters.
The built-in type String
is just an alias for [Char]
, i.e. a list of characters. Haskell supports special built-in syntax for strings, using double quotes, but this is just syntactic sugar to the list syntax.
String
also has the same performance issues as lists: While it is fine to use them in non-critical parts of the code (diagnostic and error messages, command line and configuration file parsing, filenames), String
is usually the wrong choice if large amounts of strings need to be processed, e.g. in a templating library. Additionally libraries provide more suitable data structures, in particular ByteString
for binary data and Text
for human-readable text.
Records
Assume you want to create a type that represents an employee in a HR database. There are a fair number of fields to store -- name, date of birth, employee number, room, login handle, public key etc. You could use a tuple with many fields, or create your own data type with a constructor with many fields, but either way you will have to address the various fields by their position, which is verbose, easy to get wrong, and hard to extend.
In such a case, you can use records. These allow you to give names to the fields of a constructor.
data Employee = Employee
{ name :: String
, room :: Integer
, pubkey :: ByteString
}
Newtypes
Sometimes you will see a type declaration that uses newtype
instead of data
:
newtype Baz = Baz Integer
For all purposes relevant to us so far you can mentally replace newtype
with data
. There are difference in memory representation (a newtype
is “free” in some sense), but that is, at this level, irrelevant for us.
Type synonyms
Haskell allows you to introduce new names for existing types. One example is the type String
, which is defined as
type String = [Char]
With this declaration, you can use String
instead of [Char]
in your type signatures. They are completely interchangeable, and a value of type String
is still just a list of characters.
So type synonyms do not introduce any kind of type safety, they merely make types more readable.
Code structure small and large
The next big topic we need to learn is how programmers structure their code. This happens on multiple levels
- within a file: functions, type signatures, and documentation fragments are arranged.
- within a project (library, package): code is spread out in different files, and imported from other files.
- between projects: packages are versioned, equipped with meta-data, and depend on each other.
Comments
Of course, Haskell supports comments. There are line comments and multi-line comments:
answer = 42 -- but what is the question?
{-
In the following code, we write a function that correctly tells
us whether a Turing machine halts:
-}
halts :: TuringMachine -> Bool
halts turing_machine = halts turing_machine
Importing other modules
Obviously, the point of having multiple files of Haskell code is to use the code from one in the other. This is achieved using import
statements, which must come right after the module
header, and before any declarations.
So if we have a file Target.hs
with content
module Target where
who :: String
who = "world"
and another file Tropes.hs
with content
module Tropes where
import Target
greeting :: String
greeting = "Hello " ++ who ++ "!"
then the use of who
in greeting
refers to the definition in the file Target.hs
.
We could also write
greeting :: String
greeting = "Hello " ++ Target.who ++ "!"
and use the fully qualified name of who
. This can be useful for disambiguation, or simply for clarity.
If we only ever intend to refer the things we import from a module by their qualified names, then we can use a qualified import:
import qualified Target
This does not bring any unqualified names into scope.
And if the module has a long name, we can shorten it:
import qualified Target as T
and write T.who
. This is a common idiom for modules like Data.Text
that export many names that would otherwise clash with names from the prelude.
The standard library, called base
, comes with many modules you can import in addition to the Prelude
module, which is always imported implicitly.
Import lists
If we do not want to import all names of another module, we can import just a specific selection, e.g.:
import Data.Maybe (mapMaybe)
This makes it easier for someone reading the code to locate where a certain function or data type is from, and it makes the code more robust against breakage when a new version of the other module starts exporting additional names. These would not silently override other names, but cause compiler errors about ambiguous names.
You can import types just as well, just include them in the list. To import constructors (which look like types), you have to list them after the type they belong to. So if we put our definitions of Complex
and Riemann
into a file Riemann.hs
, namely
module Riemann where
data Complex = C Integer Integer
data Riemann = Complex Complex | Infinity
then you can import everything using
import Riemann (Complex(C), Riemann(Complex, Infinity))
or, shorter,
import Riemann (Complex(..), Riemann(..))
Export lists and abstract types
You can not only restrict what you import, but also what you export. To do so, you list the names of functions, types, etc. that you want to export after the module name:
module Riemann (Complex, Riemann(..)) where
A short export list is a great help when trying to understand the role and purpose of a module: If it only exports one or a small number of functions, it is clear that these are the (only) entry points to the code, and that all other declarations are purely internal, and may be refactored without affecting anything else.
By excluding the constructors of a data type from the export list, as we did in this example with the Complex
type, we can make this type abstract: Users of our module now have no knowledge of the internal structure of Complex
, and they are unable to create or arbitrarily inspect values of type Complex
. Instead, they are only able to do so using the other functions that we export along with Complex
. This way we can ensure certain invariant in our types – think of a search tree with the invariant that it is sorted – or reserve the ability to change the shape of the type without breaking depending code.
Proper user of abstract types greatly helps to make code more readable, more maintainable and more robust.
Haskell packages
Zooming out some more, we come across packages: A package is a collection of modules that are bundled under a single package name. A package contains meta-data (name, version number, author, license...). Packages declare which other packages they depend upon, together with version ranges. All this meta-data can be found in the Cabal file called foo.cabal
in the root directory of the project.
Almost all publicly available Haskell packages are hosted centrally on Hackage, including the haddock-generated documentation and cross-linked source code. They can be easily installed using the cabal
tool, or alternative systems like stack
or nix
.
The packages on Hackage cover many common needs and it is expected that a serious Haskell project depends on dozen of Haskell packages from Hackage.
Imperative (looking) Haskell
Beware: This chapter is full of half-truths and glossing over technical details. Imagine plenty of “it looks as if” and “one can think of this as” sprinkled throughout it. Nevertheless, it is useful to get you started.
IO
-functions
Haskell functions are pure functions in the mathematical sense: Given some input, they calculate some output, but nothing else can happen, and nothing besides the arguments can influence the result. This is great, but how can Haskell programs then write to files, or respond to network requests, or come up with random numbers?
The solution are IO
-functions. These functions can be executed, and when such a function is executed, it can do all these nasty things, before returning a value. Here is a selection of IO
-functions available by default:
getLine :: IO String
putStrLn :: String -> IO ()
readFile :: FilePath -> IO String
writeFile :: FilePath -> String -> IO ()
You can see that these functions may have arguments, just as normal functions. The important bit is the return type, which is IO Something
. This indicates that these functions can be executed, and that they have to be executed before we get our hands on the result.
Not all functions have an interesting result (e.g. putStrLn
does not); this is where the unit type comes in handy.
The M word
Sometimes the types of functions will look similar to the IO
-functions, but
using some other word than "IO".
sendMessage :: String -> Net ()
receiveMessage :: Net String
Much as readFile
has a return type of IO String
, receiveMessage
has Net String
, and this is is a good hint that it also can be executed
and when executed it probably does some impure IO under the hood
before returning a value.
IO
and Net
are what is known as monads, but you only need to worry
about that when digging into their implementation.
do
notation
Often code using IO
or other monads uses a special syntax,
called do
-notation, that allows writing code in an imperative style.
Here is an example:
main :: IO ()
main = do
putStrLn "Which file do you want to copy?"
from <- getLine
putStrLn "Where do you want to copy it to?"
to <- getLine
content <- readFile from
putStrLn ("Read " ++ show (length content) ++ " bytes.")
writeFile to content
putStrLn "Done copying."
We can compile and run this program, and it indeed copies a file:
$ ghc --make copy.hs
[1 of 1] Compiling Main ( copy.hs, copy.o )
Linking copy ...
$ ./copy
Which file do you want to copy?
copy.hs
Where do you want to copy it to?
copy2.hs
Read 287 bytes.
Done copying.
$ diff copy.hs copy2.hs
Looking at the code, it doesn’t look much different than the equivalent in a language like C or Python. Things to notice:
- The
main
function of the module is special. Just like in C, it is the entry point for a compiled Haskell program. When we run the program, then themain
function is executed. This is the only way to start executingIO
-functions -- we cannot do that just nilly-willy within other code. - The body of the
main
function is written as ado
block, which clearly signposts the imperative nature of this code: It is a sequence of things todo
. - Every line below the
do
block is one execution of anIO
-function. The first one, for example, prints a question on the terminal. - The
main
function has typeIO ()
. So it is one of theseIO
-functions as well.
Type classes
The language features we have seen so far can be found, with slight variations, in most functional programming languages. In this chapter, we will look at a feature that Haskell is particularly renowned for: Type classes.
Let me start by pointing out what type classes are not: They are not classes as we know them from object oriented programming, so please do not try to attempt to understand them by analogy to that.
We've already seen the Eq
type class used in a constraint on a polymorphic
function type:
genericPrompt :: Eq t => (String -> Maybe t) -> t -> IO t
The other parts of syntax that you will see are declaring type classes, and adding instances to type classes.
The Eq
type class is declared in the Prelude
with this code:
class Eq a where
(==) :: a -> a -> Bool
That says that ==
is an operator name that can be overloaded,
by declaring a class with it as a method. With that defined,
it's allowed to use ==
with every type that is an instance of Eq
.
Here is code that declares instances of Eq
for Complex
and Riemann
:
instance Eq Complex where
C x1 y1 == C x2 y2 = x1 == x2 && y1 == y2
instance Eq Riemann where
Complex c1 == Complex c2 = c1 == c2
Infinity == Infinity = True
_ == _ = False
Don't worry about the implementation details, but notice that this
defines different ==
functions for the different types.
Often instances of common type classes are derived as part of the definition of a data type.
data Baz = Baz Integer
deriving (Eq, Ord)
The deriving
causes the Haskell compiler to automatically generate
the necessary code to make Eq
and Ord
instances for the Baz
data type.
Modules will often declare type classes of their own, as well as making data types that are instances of many other type classes.