I want to say a few words on how I write Haskell as of 2017.
Before I go on, I want to make very clear: this is neither a style recommendation nor a judgment on any other style. This is how I write Haskell, and many of the principles below I’ve come to either because of past experiences with writing or maintaining Haskell, because of how I originally learned or was taught Haskell, or because of experiences I’ve had with other languages in general.
However, even though I like to adhere to these principles, I don’t want to advocate that any of them are universal! You probably write Haskell differently, and that’s not just okay but actively good: one of the great things about Haskell as a language is that it’s a powerful raw material that permits all kinds of different powerful and expressive idioms, and someone else’s Haskell style will definitely take advantage of Haskell in ways mine doesn’t.
The Boring Basics
This stuff isn’t actually interesting, and I make an effort to stick to the code style of the codebase I’m writing in, but here’s the textual details of how I write Haskell in the absence of any other factors:
I prefer two spaces and, whenever possible, to keep code further to the left: that means, for example, that I like to start a new line after the
= in data and (if they’re longer than one line) after function heads and after the
do keyword, in order to ‘swing’ the indentation back left. I also prefer left-aligning operators as much as possible, and I do like alignment of repeated operators within lines (like
:: in record definitions or
case-expressions). For example:
A special case of my always-to-the-left principles is that if a function applies a short expression to something which can be ‘swung’ to the left, like a
do block or a list comprehension, then I’ll often include the shorter expression on the same line as the function head, and then swing the expression it applies to. That’s a bit abstract, so for a concrete example:
(I’ll also talk about this in more detail later, but I use list comprehensions pretty regularly in day-to-day programming, and I think they’re the right choice for a lot of iteration.)
camelCase my names. This is largely for consistency—I actually find
kebab-case to be vastly more readable, and in the absence of an existing community style, I probably would gravitate to them—but it’s what Haskell uses, so it’s what I use.
I try to name things using names whose length is proportional to the range of the scope in which they appear—that is, fewer characters for a scope of a line or two, more characters if a name shows up for a longer block—but I tend to feel that it’s never a bad idea to pick a slightly longer name. I definitely have a few conventions I hold to lightly for particular varieties of names:
x:xsfor iterating over a generic sequence, and a similar
_:_sfor iterating over sequences of specific things. (e.g. if I’m iterating over a list of addresses, I might use
addr:addrsor, for a single-line function,
Functions for creating a thing are generally
mkThing. I mostly like
mkbecause it’s evocative but also only two characters: it gets out of my way.
Things which might be
NothingI often give an
Mbsuffix, specifically when I’m going to discriminate on that same value and pull out the interior value. For example:
Naming is an art that’s worth getting good at, and I don’t know that I’m great at it, but I do like to think hard about exported names in APIs before I commit to a name.
There are some extensions that are so comfortable to me that I almost always turn them on if there’s even a minor chance I might need them: specifically,
There are a lot of syntactic extensions that I’m okay with if I feel like they’re pulling their weight: a good example here is that some code can be made significantly cleaner by application of extensions like
RecordWildCards. I’m not likely to turn those on in a source file unless I feel like not having them is painful. For a concrete example: parsing and reading JSON using the
aeson library is much easier and more readable with
RecordWildCards enabled, but I wouldn’t use it if it made just one function nicer.
I have a similar relationship with a some of the type system extensions:
TypeFamilies all can make code cleaner and more powerful, but again, I’m not going to turn them on unless it becomes very clear that I need them, or that a particular desirable API is straight-up inexpressible without them. Of these, I want to call out
TypeFamilies in particular: sometimes, using
TypeFamilies, especially closed type families, can make certain abstractions much nicer, as in using a type family to change the types of AST nodes based on the current phase of a compiler.
A lot of other extensions, though, I’ve never felt like I needed, and when I’ve used them I often feel they obscure the problem I’m solving. Many extensions—even some of the ones I’ve already mentioned—can make refactoring or analyzing code harder, and there are often better tools to reach for in those situations. In particular, almost any extension that exists to help with the definition or selection of more elaborate typeclass instances or contexts—things like
ConstraintKinds, and so forth—tend to tip me off to the fact that I’m solving a problem with typeclasses that I should be solving another way. I’ll get into that a little more down below, when I talk about my relationship with typeclasses in Haskell code.
(If I’m being honest with myself, I actually kind of like
MonadComprehensions—and I have semi-seriously pushed for
ApplicativeComprehensions in the past as a better alternative to
ApplicativeDo, which I find slightly rickety—but I don’t actually use monad comprehensions in my code in practice.)
Now this section contains what I think is interesting: not, say, how many spaces to indent things, but rather, what kind of higher-level trade-offs to make while writing code.
Haskell records are bad, but use them anyway
I use a lot of records in my code, and the reason is simple: I like giving things names. I know naming things is supposed to be one of the hard problems in computer science, but in this case, I like to think of names as documentation of the purpose of a piece of data: giving a field a name tells me what that field is supposed to be or do. Any time a data type has exactly one constructor, I will try to give every piece of data inside it a field name. If it’s a
newtype, then it might be a field name like
fromFoo, but I still do that anyway.
I should also be clear here: I use record syntax when I have a type with just one constructor. Partial functions are a dangerous thing to include and can silently introduce errors into code, and record selectors defined on only some constructors end up being partial functions! When I say ‘records’ here, I mean ‘single-constructor types with record syntax’.
When creating values of a record type, I effectively always use the record syntax to do so. This is especially useful for records that contain more than two or three fields: sure, for a five-argument constructor, I could always look up what the arguments mean for every constructor and what order they appear in, but it’s so much nicer to provide those by name and have that redundant characterization.
It’s also a powerful way of refactoring function calls with more than two or three arguments. Haskell doesn’t have named parameters, but they can be faked by bundling the information needed for a call into a single piece of data, and then that allows for e.g. providing defaults that can be overridden with record update syntax. I consider doing this for functions of four or more arguments; by the time I get to about six or seven arguments, not doing this is almost criminal.
This brings me to another point:
Never hesitate to create a type alias or—even better—a type
It doesn’t matter if it doesn’t get exported, or only gets used by one function, or has a wonky name: types are always good. One of my major pet peeves about Haskell-the-language is that I can’t define new types within
let bindings or
where clauses, like one can in other ML languages, because sometimes I want a new type for just a few lines. Doesn’t matter. Make it. This can go a long way towards clarifying data-structure-heavy code, and almost always makes it less susceptible to bugs, too.
It doesn’t matter if the type I’m defining is 100% isomorphic to some other type, too: a common pattern in code I’ve written is to include a type which is identical to
Maybe, occasionally monomorphized to some particular type, like
In my view, a type like this is often better than just using a
Maybe Text in these situations: using a
Maybe is more flexible, but the flip-side is that it permits a slew of operations that I might not want to be valid on a
Maybe Text tells a reader little about where it came from—it might or might not have a
CommandResult would localized to a smaller chunk of code.
As a related point:
Specialize for intent
This is one where I run against current Haskell orthodoxy, not to mention popular Haskell linters: while I like exporting polymorphic functions, I often prefer using (comparatively) monomorphic functions, even when they’re a monomorphic variation on a more general polymorphic function. A good example here is
concatMap, a function which I genuinely like. In many situations,
concatMap acts exactly like the monadic
(>>=) operator, but with the monadic type monomorphized to
The only difference between the two functions (at present) is that
concatMap can take an arbitrary
Foldable value and turn it into a list. However, code that uses
concatMap with an argument whose type is known to also be
[a] will raise complaints from linters like hlint, which suggests using
This runs contrary to my taste in code: I prefer using
concatMap in these situations precisely because it is less generic. The typechecker can easily tell that my use of
(>>=) is working on lists, and upon deeper examination of the code I can of course find that out, but that may not be obvious to me, the reader, from a perusal of the textual context of the code: on the other hand, the function
concatMap always produces a list, and thus gives a human reader specific, concrete type information, even if that type information is redundant to the type-checker.1
In general, if I’m not explicitly working within an abstraction, I don’t want to borrow generic features of that abstraction: I am happy to use
++ instead of
<> to concatenate lists, and type-specific
length functions instead of a generic
Foldable one, and I do not like to rely on the
Arrow typeclass unless I am explicitly writing
Arrow-polymorphic code. Borrowing an abstraction to do something I can do without that abstraction feels like mixing concerns for me: if I have to keep details of the current code in mind, I’d rather be thinking in terms of what the code is actually doing, rather than in terms of a higher-level abstraction that’s not actually abstracting anything.
Of course, if I am writing something that needs to be polymorphic, then of course I will use the polymorphic functions: if I’m writing a snippet of code that I want to be able to use different
Arrow2 instances, then those polymorphic functions make sense. That means that I am using those abstractions as abstractions. Similarly, sometimes an API has been designed so that no monomorphic operations are actually available: an example of this is the excellent
trifecta parsing library, which is written to expose most practical operations in terms of a
Parsing typeclass defined elsewhere, rather than exporting its own monomorphic operations. (This is, after all, why these abstractions exist in the first place: so that people can provide functionality while abstracting over them!) But, all else being equal, if I have to choose between a monomorphic and a polymorphic version that are otherwise the same function, and I’m working with concrete types that I know, then I will generally prefer the monomorphic function.
Another, related phenomenon is redundant typeclass constraints. As a simple example: say I’m implementing a set using a binary tree, ordering it by the type’s
Ord instance. An
insert operation will of course need to have an
Ord constraint, but in theory, my
empty set doesn’t actually need any such constraint: after all, the
Ord constraint is only necessary in
insert because I’m comparing the inserted value against existing values in the set. Why include an
Ord constraint for
empty, where there are no values to even attempt to order?
data Set a = Node a (Set a) (Set a) | Leaf insert :: Ord a => a -> Set a -> Set a insert datum Leaf = Node datum Leaf Leaf insert datum node@(Node c l r) | datum < c = Node c (insert datum l) r | datum > c = Node c l (insert datum r) | otherwise = node -- This constraint is unnecessary! empty :: Ord a => Set a empty = Leaf
In many (although not all) cases, I would prefer to include such a redundant constraint, and the reason, yet again, is communicating programmer intent. While there’s nothing stopping me from creating an empty
Set of an unorderable type, there is also no practical reason to ever do so: I would never be able to insert or remove anything from it, and I could only ever peform entirely trivial operations on it, such as observing that its size is indeed zero. My intent in exposing this API is for all sets to contain orderable types: why not enforce that intent with these constraints, even if those constraints aren’t strictly necessary in the function body?
In many of the above cases, what I am doing is deliberately opting out of abstractions in favor of concrete knowledge of what my program is actually doing. Abstractions can enable powerful behavior, but abstractions can also obscure the simple features of what a program is doing, and all else being equal, I’d rather communicate to a reader, “I am appending two lists,” instead of the more generic, “I am combining two values of a type that implements
Qualify imports, even when it’s not strictly necessary
This is a habit that I’ve come around to after writing several larger pieces of software in Haskell, especially Matterhorn: qualified imports are pretty much always a good thing. At some point, my habit was to carefully curate import lists instead, but there are trade-offs there: it becomes easy to find out where a function comes from by looking at the source file header, but harder to find out where it comes from when looking at the use site. By qualifying imports, I can instead get information at the use site about where an import comes from, which I’ve found to be more and more useful over time: coming in and reading the center of a module, I can easily observe that this function comes from the module imported as
X while this one comes from the module imported as
A side-note here is that sometimes, especially when I’m using only one or two functions from an external module, or when two or more external modules are closely related in their operation, I will import multiple modules qualified under a single namespace. For example, if I need one or two functions from both
System.FilePath, then I might import them both to the same shared namespace:
I also do this often with
Data.Text.IO together. This is something to be done with care, and it’s good to err on the side of importing each module with a distinct qualified name, but occasionally it does help readability to keep the number of distinct module names in use in a particular file low.
Treat imports and dependencies as a minor liability
External dependencies in general have a cost: it’s not a huge one, but a cost nonetheless, and I think it’s worth weighing that cost against the benefit pretty actively.
A minor cost—one that matters, but not all that much—is total source size. Haskell programs tend to pull in a lot of dependencies, which means compiling all of them can take a very long time and produce a lot of intermediate stuff, even if a comparatively small amount makes it in to the final program. This isn’t a dealbreaker—if I need something, I need it!—but it’s something to be cognizant of. If my from-scratch build time triples because I imported a single helper function, then maybe that dependency isn’t pulling its weight.
A bigger cost—in my mind, anyway—is breakage and control. It’s possible (even likely) my code has a bug, and then I can fix it, run my tests, push it. It’s also possible that an external dependency has a bug: fixing this is much more difficult. I would have to find the project and either avail myself of its maintainer (who might be busy or overworked or distracted) or figure it out myself, learn the ins and outs of that project, its abstractions, its style, its contribution structure, put in a PR or discuss it on an email list, and so forth. And I can freeze dependency versions, sure, but the world goes on without me: APIs break, bugs get introduced, functionality changes, all of which is a minor but present liability.
It’s not a dealbreaker, but it’s a cost, one that I weigh against the value I’m getting from the library. Should I depend on a web server library for my dynamic site? Well, I’m not gonna write a whole web server from scratch, and if I did, it’d take forever and be worse than an existing one, so the benefits far outweigh the costs. But on the other hand, if I’m pulling in a few shallow helper functions, I might consider replicating their functionality inline instead. There’s always a cost: is it worth the weight of the dependency?
A concrete example where I decided, “No, it’s not worth the dependency,” is in my
config-ini INI parsing library. It includes an API that uses lenses. The
lens library is very heavyweight: it includes a lot of transitive dependencies and a significant amount of code in its own right, and even minimal
lens-compatible libraries like
lens-family-core are comparatively hefty for what I needed: the
Lens type alias and the ability to
set a lens. Redefining those inline takes less than a dozen lines of code. In this case, the cost of duplicating those functions—easily verified as correct!—in my module was low enough that I didn’t bother importing any lens library at all!
But of course, the Matterhorn chat client I work on does import a lens library. I’m less concerned about the cost of dependencies for a final executable, as it’s not going to inflate anyone else’s binary size or build process, and we use a significant number of other helper functions. In that case, the cost of duplicating that functionality inline is much, much greater than just using the library.
So my principle here isn’t “always minimize dependencies”, but rather, “always remember to weigh the value of the dependency against its cost.”
Iterate with list comprehensions
I’ll start out by saying that there are times I wouldn’t prefer list comprehensions. For example, if I’m just
mapping an existing function over a list, then I’d definitely prefer
map f xs to its comprehension cousin
[ f x | x <- xs ]: the former is clearer in its intent and doesn’t introduce any new short-lived names. Similarly, a quick
filter or even a
map f . filter g will usually be shorter and clearer than the comprehension
[ f x | x <- xs, g x ].
However, there are a lot of situations where a comprehension is much clearer than its hypothetical
filter-based equivalent. One of the biggest advantages to comprehensions, at least in my experience, is that they can use refutable patterns in order to simultaneously decompose and filter expressions from a list. Consider the actual definition of the
catMaybes function from
We could write this using a combination of existing partial functions (i.e. as
map fromJust . filter isJust) or other abstractions (
>>= maybeToList) or we could write it with manual recursion, but this snippet is, in my opinion, much more terse and readable, and doesn’t rely on any other helper functions. This extends to more complicated list processing tasks, as well:
For comparison, here’s a point-free alternative to this snippet:
Some people may prefer this style (or might prefer a slightly different variation, e.g. by combining the
concat . map personPets into
(>>= personPets)), but for my personal preferences, I prefer reading the comprehension-based one: both of them transform a list in a ‘step-by-step’ manner, but the comprehension-based one has three ‘logical’ steps that correspond to decomposing each element of the
people list, selecting and decomposing each
pet from a person’s list of pets, and then joining the final string, while the point-free one includes several steps or details related to massaging the relevant types (c.f. the
catMaybes lines), and also relies on a handful of other helper definitions that aren’t necessary in the comprehension case.
ParallelListComp is a very good extension, because it provides a single language-level mechanism that otherwise would have to be filled with various
zipWith variants of various arities. (I will admit I’m less sold on
TransformListComp, but maybe I haven’t found a good use for it yet!)
Be 100% sure a typeclass is the right choice
I’m not gonna go hard-line like some people and assert that typeclasses are “considered harmful” or that they should be blanket avoided, but I do think they’re easy to overuse, and it’s easy to reach for one without weighing the upsides and downsides of typeclasses. And what’s more, non-typeclass solutions can be surprisingly powerful!
The biggest advantage typeclasses provide is implicit instance selection. This is why typeclasses are such a powerful net win when working with, say, numeric types: without typeclasses, we’d have to have a separate monomorphic
+ for each numeric type, and that’d get unwieldy pretty fast! With a typeclass, we get that instance selection for free.
But on the other hand, typeclasses necessarily permit at most a single instance per type. In the case of
+, that’s pretty reasonable, but it’s a little bit of a harder proposition when talking about something more like
show. There are a lot of ways to turn a value into a
String: we might want to control indentation, or padding, or use different representations (e.g. decimal or octal or hexadecimal for numbers), all of which would require elaborate newtype wrappers if we wanted to do them with the existing
Of course, I’m picking on
Show when I shouldn’t be:
Show isn’t for tight control over printing, it’s for quick programmer-focused output: if I wanted to tightly control how to print a structure, then I should be using a
printf-style library or a pretty-printer, not
Show. Let’s instead talk about a classic example of a typeclass that I think shouldn’t be a typeclass: QuickCheck’s
A simplified form of QuickCheck’s
Arbitrary typeclass looks like
Gen monad encapsulates non-deterministic generation of structures along with some extra information about the ‘size’ and ‘depth’ of what’s being generated. In practice, that means that an
Arbitrary instance for a type is going to look pretty mechanical: most of the fields can be initialized with the result of a simple call to
Ah, but wait, I say to myself in this contrived scenario! My application’s validation code only ever creates
Pet values with non-negative ages, so most of my QuickCheck tests—the ones that test internal application functionality—are going to only be interested in valid
Pet values. I will modify my instance accordingly: in this case, I can using the QuickCheck-provided
Positive newtype, whose
Arbitrary instance will only ever create positive numbers, in order to only ever generate pets with non-negative ages:
Ah, but sometimes I do want to run QuickCheck tests over invalid
Pet values, just in case. Well, what to do now? I guess I need a newtype wrapper over
Pet, too, so I can have one instance for valid
Pet values and another for possibly-invalid ones:
Ah, but also pet names aren’t going to be arbitrary unicode sequences; they’re going to be validated so that people don’t name their pets emoji or the empty string or something like that, so now I have two axes along which a pet might be invalid. Do I need four newtypes, for the cross-product of the four configuration options I want?
I’m of course being obtuse: my obvious choice here would be not to create an
Arbitrary instance for
Pet, but rather to create a function that returns a
Gen value that I can use explicitly:
While somewhat contrived, I don’t think the above is an unlikely scenario. In most situations, I wouldn’t want just a single, non-modifiable generator for a type: I’d almost always want a generator that has some controls, or one of several different generators depending on context. Even with simple types like the built-in numeric types, I would want to generate numbers that have various properties that aren’t inherent to the type: an
Int that’s always non-negative, or that’s a valid index into a list, or something like that. QuickCheck offers various newtypes to work around this, but even still, one often has to resort to other generation methods (e.g. by filtering by a check or generating from a list of possibilities instead of using the bare
All that is to say: the
Arbitrary typeclass makes sense if the choice of
Gen t is always a function of the type
t, but in practice, that’s rarely true.
I’ve established now that some uses of a typeclass like
Arbitrary are probably ill-advised, or at least could be better served by using named functions: but of course, QuickCheck does have many named functions of that sort which coexist peacefully with
Arbitrary instances. And there’s another big factor in the use of the typeclass: the convenience of being able to use a single value name (
arbitrary) instead of looking up a different name for every generator function. There is something very pleasant about being able to write
(,) <$> QC.arbitrary <*> QC.arbitrary: no need to look up the right function name for every type, and using
Gen becomes almost mechanical!
But in the long term, I would still prefer an explicit version. Consider what the above, naïve generator—the one that only generates ‘valid’ pets—would look like using the API provided by
hedgehog, a newer QuickCheck-like library that does not include a typeclass analogous to
Notice that every
Gen function is explicit about exactly what it’s generating: a
Maybe wrapper around a
String of a length that’s at least 1 and at most 50; one of the two values
Cat; an integer in the range
30. This is definitely more tedious to write than the typeclass-based version: I can’t just spam
arbitrarys separated by
<*> until it compiles, now I gotta make sure I know the right function names, and it’s noisier and bigger, and there’s more information per line…
But if I haven’t touched this code in weeks or months or years, and I come back to it: I can clearly see what it’s doing. I don’t have to go consult the definition of
Pet to find out what the types are and what instances are getting called, because I know exactly what functions are getting invoked at every point.
That is to say: the implicitness of the
Arbitrary approach is optimized for writing, but the explicitness of manual functions is optimized for reading and remembering.
And I haven’t gone into the other places where typeclasses can fall down: for example, the way that typeclass resolution can sometimes fail, the various complicated extensions that can be used to get around typeclass selection, the sometimes voluminous and sometimes inscrutable error messages that they can sometimes generate: all those are factors, sure, but they’re not the main reason that I try not to reach for typeclasses: even if all those things were solved via some kind of technical magic or a tower of
LANGUAGE pragmas, I still would try to use explicit functions instead of typeclasses in my Haskell code.
I also should reiterate that I still use typeclasses, and sometimes I do want them! There are times that the explicitness is too heavy a cost, or the instance selection maintains invariants that I want to have maintained, or the code is just plain better-looking. But I also think that there are many places in Haskell where a typeclass is used and an explicit function or record of functions would be preferable, which is why my personal tendency is to avoid writing a new typeclass until I’m sure that what I need is a new typeclass.
There’s a unifying theme to a lot of my Haskell style, and it is this: be explicit and use names. Why pervasively use record fields? I’m being explicit about the purpose of those pieces of data by giving those fields a name. Why qualify imports? I’m being explicit about their provenance by giving their provenance a name. Why use functions or records of functions instead of typeclasses? I’m being explicit about what functionality I’m dispatching to by giving that functionality a name. A related theme is: optimize for reading code later. If it takes longer to write or uses more lines or more variable names, but it’s going to be clearer to another person reading my code—or to me, coming back to that source file in a month or a year—I will absolutely take the time to write the longer code.
But also, like a mediocre high-school essay writer, I want to reiterate what I wrote at the beginning: this is one of many styles, and it has its own tradeoffs! (Verbosity being a big one here.) Your style might be (and probably is) different, and that’s a good thing: in fact, I’d be interested in reading analogous posts to this one that describe the specifics of and motivation for other people’s Haskell styles! I don’t think there’s a “wrong way” to write Haskell: there’s a multiplicity of possibilities, and we as Haskell users should embrace that!
…well, unless y’all use
TypeInType, in which case you’re clearly depraved in ways that medical science cannot fix.
Of course, sometimes using type information isn’t redundant: occasionally, there are situations where some function is so polymorphic that Haskell can’t even figure out the appropriate concrete type, and then using a monomorphic function is helpful not just to a human reader but is actually helpful to the type-checker! But that’s just icing on the cake: even if the typechecker has more than enough information to infer the appropriate type, I still would often prefer to use the monomorphic versions.↩
Y’know, if there were other
Arrowinstances that mattered for some reason.↩