A brief introduction to Haskell
So, what I really wanted to do was to make a post about how functional programming concepts are useful after a discussion that came up in the shoutbox over my suggested use of a function as the second argument of string.replace in @Scuzz’s omdb plugin thread, but I realized that some of the more subtle examples are hard to show without rigorous syntax. So, I thought I would use this post to outline a bit of Haskell syntax which will touch on some functional concepts and lay the ground work for understanding examples in a later post arguing the utility of functional programming. DISCLAIMER: I’m not an expert in Haskell, nor have I built any real software with it, so forgive me if I remain in the conceptual domain
So, here we go!
First, if you want to play around with Haskell, it seems the go to is the Glasgow Haskell Compiler (GHC). Check out the Installation page at haskell.org
(P.S. on mac brew install ghci)
Maths
Arithmetic
As with all of our favorite interpreted languages, we can sit at the interpreter and do math all day:
> 2 + 4
6
> 2 * 4
8
> 2 ** 4 -- 2 ^ 4 (line comments in Haskell begin with --)
16.0
Great, so we know we have some predefined types and operations (technically speaking in the manner we’ve presented above, those literals are not primitives; they are actually boxed, but we’ll get there in time).
To exit the interpreter, use :quit
Variables
Like other languages, we can bind an expression to a name.
> let x = 10
> x
10
Variables MUST begin with a lowercase letter.
Unlike other languages, we cannot modify the value of x in a literal sense. We can rebind x, but this will create a new object rather than updating the old one. This leads to an important result:
> let x = 10
> x
10
> let x = x + 10
> x
In our favorite imperative language, we would expect to see the value of x updated to 20. However in Haskell it doesn’t evaluate.
Why? because we have defined x in terms of itself.
If we try to expand it, we get x + 10. Expand that, we get x + 10 + 10.
If that doesn’t make sense, think about it this way:
what we said is “where ever you see x replace it with x + 10”.
Now, if I asked you “what is x?” here, you can only answer “x + 10”
As you can see, this can never resolve to a value.
Functions
Of course, we can write our own math functions
> let addOne x = x + 1
> addOne 1
2
The syntax for defining a function is
let <function name> <variables> = <expression>
The syntax for calling is just
<function name> variables.
Function names MUST begin with a lowercase letter.
The interesting thing in Haskell (as well as other functional languages) is that functions are first-class citizens, meaning they can be treated like any other data. This allows us to do things like pass functions as arguments.
> let square x = x * x
> let applyTwice f x = f (f x)
> square 2
4
> applyTwice square 2 -- (2^2)^2
16
This is obviously a contrived example, but it shows that we can pass a function and then call that function.
Loading Haskell files
Ok, before we go any further, lets get off the cumbersome interpreter. A lot of future example code will be in files. When you see the ‘>’ you’ll know I’ve switched back to the interpreter, but just to make it clear, I’ll always show myself loading the file in question
e.g.
-- add.hs
add x y = x + y -- Function declaration syntax in a file (no need for let)
> :load add -- load syntax (leave off the .hs)
[1 of 1] Compiling Main ( add.hs, interpreted )
Ok, modules loaded: Main.
> add 2 4
6
Data
Custom Data Types
So at this point, we know how to
do stuff with built in types
declare variables
declare and invoke functions
load code from a file
Next, we would like to be able declare our own data types, much like being able to define our own classes.
For this section, we will be implementing our own Bool type which we’ll call Loob with values Yes and No
-- loob.hs
data Loob = Yes | No
data types MUST begin with a capital letter, like how functions must be lowercase.
What did this actually do, @Almost?
This defines a new data type, Loob
with 2 data constructors, Yes and No.
There isn’t a good equivalent to a data constructor in imperative languages, but the easiest way to think about it is kind of like an enum. We will see later how this notion isn’t quite right.
enum Loob {
Yes,
No
}
Let’s take a look at our new type.
> :load loob
> let x = Yes
> x
Yes
> :type x -- Gets type information from a variable
x :: Loob -- Format: <name> :: <type>
The let x = Yes statement instructs the interpreter to use the Yes data constructor to instantiate a new Loob and assign it to the variable x.
Meaning the data of x is now Yes and the type of x is now Loob.
In imperative space this is kind of like
Loob x = Loob.Yes
Going back to loob.hs, let’s define our own and function which we’ll call dna.
-- loob.hs
data Loob = Yes | No
dna Yes Yes = Yes
dna No _ = No
dna _ No = No
> load loob.hs
> dna Yes Yes
Yes
> dna Yes No
No
> dna No Yes
No
> dna No No
No
This example shows off a couple of important points in Haskell.
Functions can take arbitrary parameters
the _ in the function definition means “any argument” in can be any type, any value, we don’t care. We know that No and Anything is always No, so we don’t even need to look at the second argument. This does have the side effect that dna No 1234 is defined as No which may or may not be desirable. Since we don’t give _ a name, we can’t use it in the body of our function. It literally means “I don’t need to know, nor even care what this data is”
Functions can have multiple definitions over the same argument data type
It might be hard to see what I mean due to the usage of _ but we could have written out all the cases explicitly:
dna Yes Yes = Yes
dna No Yes = No
dna Yes No = No
dna No No = No
In an imperative world, this should be alarming. We just overloaded dna 4 times and each one takes 2 Loob arguments! If I were to calls:
> let x = No
> let y = Yes
> dna x y
How should it know which of the 4 functions to use?
This is where our enum notion falls apart.
Haskell uses the data constructor for the variables to perform pattern matching to find the first definition that matches the input.
therefore our example above explicitly matches the 2nd definition for dna, so that one will be used.
While defining our own pseudo primitives is nice and all, can we define actually useful types, like say Pair?
Sure. While we didn’t take advantage of it before, our data constructors can act as a tag for other data:
-- pair.hs
data Pair t = Pair t t -- Declare data type Pair of t (where t is an arbitrary type). You might write this as Pair<T> in C++ or Java
left (Pair x y) = x
right (Pair x y) = y
Don’t be confused here. I’ve named both the data type and data constructor “Pair”.
in our functions, left and right, the argument is weird. That looks like how we construct a Pair doesn’t it?
That’s exactly what it is. Haskell’s pattern matching basically says “Could I use this syntax to construct the object that you gave me? If yes then we have a match, if not we don’t have a match”
To try it out:
> :load pair
> let p = Pair 1 2
> left p
1
> right p
2
Lists
Haskell’s list syntax is similar to that of python:
> [1,2,3,4]
[1,2,3,4]
Haskell, however, has syntax to create a list over a range:
> [1..10]
[1,2,3,4,5,6,7,8,9,10]
One interesting fact about Haskell lists, due to the fact that Haskell performs lazy evaluation, an infinite list is completely legal and will not cause a crash or hang unless you attempt to evaluate the whole list:
> let x = [1..]
> -- no crash :)
Accessing an element of the list can be done with the !! operator.
> let x = [1..] -- infinite list
> x !! 9
10 -- 0 indexed
Remember, the lazy evaluation means that only as much as is actually used is evaluated which means that indexing an infinite list is completely legal.
Given a list, you can prepend it with the : operator
> let x = [2,3,4]
> 1:x
[1,2,3,4]
This is handy since we can use this list building syntax as a pattern in a function (as we discussed in the last section)
> let head (x:xs) = x -- x is the first element, xs is the rest of the list
> head [1,2,3]
1
> let tail(x:xs) = xs
> tail [1,2,3]
[2,3]
What we’re saying here is that the argument of head and tail is “Something (x) appended to a list (xs)”
An aside, Haskell represents strings as lists of characters
> head "Almost"
'A'
:type
Simple types
I brought up :type discreetly in the custom data types section. You can learn a lot about how Haskell works by examining the output of :type.
> :type 1
1 :: Num a => a
Read: 1 has type (Num a) which reduces to a
What this actually tells us is that:
1 is not a primitive, but rather of type a
a is restricted to those types in the Typeclass Num
That second point is similar to the concept of requiring that a generic type extends some known type in Java:
public class Foo<? extends Collection>
Other simple examples:
> :t True -- :t works in place of :type
True :: Bool
> :t "Almost"
"Almost" :: [Char]
> :load loob -- from a previous example
> :t Yes
Yes :: Loob
Function types
we see some new syntax when we look at functions
> let square x = x * x
> :t square
square :: Num a => a -> a
Let’s ignore the point that a must be in the Num Type class and focus on the a -> a
You can read this as a goes to a
Meaning, square takes a variable of type a and returns a type a.
As a concrete example, square can take an integer and return an integer.
What’s the type of a function that takes 2 variables?
> let add x y = x + y
> :t add
add :: Num a => a -> a -> a
Hmm. a -> a -> a huh?
“a goes to a goes to a?” I don’t get it.
well, let’s apply some grouping to help us out.
Let’s read it as a -> (a -> a) For now, trust me that this is better than (a -> a) -> a. I’ll explain why this is the case later.
In this view, we read “a goes to (a goes to a).”
Still a little fuzzy? Well, remember that
a -> a means that it’s a function that takes an a and returns an a.
Therefore
a -> (a -> a) should mean that it’s a function that takes an a and returns a function that takes an a and returns an a.
But that doesn’t make sense. Add takes 2 numbers and returns one number. It doesn’t take 1 number and return a function.
Or does it?
> :t (add 1)
(add 1) :: Num a => a -> a
Why did that work? Add takes two arguments god damn it! That should have given us an error!
In imperative languages we’re used to the fact that a function that has two arguments requires that you supply two arguments when you call it (or at least that you specify default in the case of optional values such as int myFunc(int myInt, bool myBool = false)).
Haskell has no such requirement.
When we evaluate add 1 2 anywhere we see x we replace it with 1 and anywhere we see y we replace it with 2.
Therefore add 1 2 is equivalent to 1 + 2
But what forces us to replace both x and y? Why can’t we replace just x?
Why shouldn’t we call add 1 and get back 1 + y?
> Because you can’t evaluate 1 + y, @Almost…
Exactly. We can’t evaluate it until we are supplied with a y. So instead of trying to evaluate it, we just give that expression back and say “give me a y and I’ll evaluate this.”
Gee, that sounds like a function, doesn’t it?
Rather than just looking at the type, let’s try giving that new function a name
> let addOne = add 1
> addOne 2
3
Woah. So it really does give us back a function.
This magic process of partially evaluating functions is called Currying (Named after one Haskell Curry. The guy was a bit of a narcissist: he named both a programming language and a mathematical concept after himself.)
Currying has a very useful property in that it naturally supports code reuse and encapsulation
A simple contrived example:
> let pow x y = y ** x --Yes, this is a bit backwards from how we usually think of pow
> let square = pow 2
> let cube = pow 3
As you can see, it gives us a really simple way to build software without reinventing the wheel or writing many many lines of code. On a philosophical level, this makes it easy for us to show that cube is just a specific case of the more general pow. We don’t get such an obvious connection if we write cube as cube x = x * x *x.
Now. An astute reader might be thinking: isn’t it kind of weird that we have 2 ways to evaluation functions: fully and partially?
Actually this is not the case. There is only one way to evaluate a function: By supplying one argument.
add 1 2 is not passing 2 arguments to add, rather it is actually evaluating as (add 1) 2 - apply 1 to add and the apply 2 to the resulting function.
What implication does this have on how Haskell treats functions?
In Haskell a function has exactly one input and exactly one output.
Either or both of the input and output can be functions, but there cannot be more than one input or output.
Now, before we wrap up our section on function types and currying, let’s talk about why we chose a -> (a -> a) instead of (a -> a) -> a.
How do we read the second one?
“add takes a function and returns an a”.
This should be valid then:
> add add
clearly that doesn’t make sense, therefore (with all the handwaving I have in me :P) this is not a valid way to read that type signature.
In the case where the argument truly is a function, Haskell will indicate that to us
>let applyToOne f = f 1
> :t applyToOne
applyToOne :: Num a => (a -> t) -> t
Don’t be scared by t. This just tells us that the return type of the function f as well as applyToOne isn’t necessarily the same type as the type o the input to f
e.g.
> let equalsOne x = x == 1
> :t equalsOne
equalsOne:: (Eq a, Num a) => a -> bool
Conclusion
tl;dr There is no tl;dr. Go read it.
Whew. Rushed is probably a better word than brief. I hope that was reasonably easy to follow, and if not, please ask questions. I want to both help you understand and improve my own technical communication.
If some sections make absolutely no sense even with some of my clarification, don’t worry too much. This stuff is very different from the usual “state-machine” style of programming that most of us are familiar with. But just because it’s different, don’t count it out. A lot of the concepts introduced in functional programming are making their way back into mainstream programming languages (e.g. Lambda Expressions, Nullable Types, LINQ).
If you’re interested in learning more about Haskell, Learn You a Haskell for Great Good is the standard Textbook. It’s free online legally, so no need to go searching for it.
Hope you enjoyed that. Help me improve this!
<3 Almost