There are certian things haskell is very good at, and I had the pleasure
of using it for one such thing yesterday. I wanted to support
expressions like find(1)
does, in git-annex.
Something like:
git-annex drop --not --exclude '*.mp3' --and \
-\( --in usbdrive --or --in archive -\) --and \
--not --copies 3
So, parens and booleans and some kind of domain-specific operations. It's easy to build a data structure in haskell that can contain this sort of expression.
{- A Token can either be a single word, or an Operation of an arbitrary type. -}
data Token op = Token String | Operation op
deriving (Show, Eq)
data Matcher op = Any
| And (Matcher op) (Matcher op)
| Or (Matcher op) (Matcher op)
| Not (Matcher op)
| Op op
deriving (Show, Eq)
(The op
could just be a String
, but is parameterized for reasons we'll
see later.)
As command-line options, the expression is already tokenised, so all I needed to do to parse it is consume a list of the Tokens. The only mildly tricky thing is handling the parens right -- I chose to not make it worry if there were too many, or too few closing parens.
generate :: [Token op] -> Matcher op
generate ts = generate' Any ts
generate' :: Matcher op -> [Token op] -> Matcher op
generate' m [] = m
generate' m ts = uncurry generate' $ consume m ts
consume :: Matcher op -> [Token op] -> (Matcher op, [Token op])
consume m [] = (m, [])
consume m ((Operation o):ts) = (m `And` Op o, ts)
consume m ((Token t):ts)
| t == "and" = cont $ m `And` next
| t == "or" = cont $ m `Or` next
| t == "not" = cont $ m `And` (Not next)
| t == "(" = let (n, r) = consume next rest in (m `And` n, r)
| t == ")" = (m, ts)
| otherwise = error $ "unknown token " ++ t
where
(next, rest) = consume Any ts
cont v = (v, rest)
Once a Matcher is built, it can be used to check if things match the expression the user supplied. This next bit of code almost writes itself.
{- Checks if a Matcher matches, using a supplied function to check
- the value of Operations. -}
match :: (op -> v -> Bool) -> Matcher op -> v -> Bool
match a m v = go m
where
go Any = True
go (And m1 m2) = go m1 && go m2
go (Or m1 m2) = go m1 || go m2
go (Not m1) = not (go m1)
go (Op o) = a o v
And that's it! This is all nearly completly generic and could be used for a great many things that need support for this sort of expression, as long as they can be checked in pure code.
A trivial example:
*Utility.Matcher> let m = generate [Operation True, Token "and", Token "(", Operation False, Token "or", Token, "not", Operation False, Token ")"]
*Utility.Matcher> match (const . id) m undefined
True
For my case though, I needed to run some IO actions to check if expressions
about files were true. This is where I was very pleased to see a monadic
version of match
could easily be built.
{- Runs a monadic Matcher, where Operations are actions in the monad. -}
matchM :: Monad m => Matcher (v -> m Bool) -> v -> m Bool
matchM m v = go m
where
go Any = return True
go (And m1 m2) = liftM2 (&&) (go m1) (go m2)
go (Or m1 m2) = liftM2 (||) (go m1) (go m2)
go (Not m1) = liftM not (go m1)
go (Op o) = o v
With this and about 100 lines of code to implement specific tests like --copies and --in, git-annex now supports the example at the top.
Just for comparison, find(1)
has thousands of lines of C code to
build a similar parse tree from the command line parameters and run it.
Although I was surprised to see that it optimises expressions by eg,
reordering cheaper tests first.
The last time I wrote this kind of thing was in perl, and there the natural way was to carefully translate the expression into perl code, which was then evaled. Meaning the code was susceptible to security holes.
Anyway, this is nothing that has not been done a hundred times in haskell before, but it's very nice that it makes it so clean, easy, and generic.