Writing a Parser in Haskell

It is common to use a family of libraries known as parser combinators for parsing in Haskell. These parser combinators permit us to compose higher-order functions to generate parsers. They are a particularly expressive pattern that allows programmers to prototype language grammars in a small embedded domain language in Haskell. Most significantly, it is possible to embed custom Haskell logic inside of the parser. A parser is structurally a function that takes a stream of characters as input and yields a parse tree. Parsing takes a list of tokens and transforms it into an abstract syntax tree. The syntax tree has parameters as children and some form of meaning as its node.

In this article, our experts will explain how parsing is done in Haskell using monads. They have assumed that a list of tokens can be generated from a lexical analysis.

Parsec

Parsec is a library that defines a set of common combinators. The table below shows the combinators and their description:

Combinator Description
String It matches the given string
Char It matches the given character
Many This combinator matches an arbitrary number of patterns, matches the patterns and returns them as a list
Many1 It is just like the many combinator but it requires at least one match
<\> It can be chained sequentially to generate a sequence of options. It is known as the choice operator. It tries to parse the first argument before proceeding to the second
SepBy It matches an arbitrary length sequence of patterns. It is delimited by a given pattern
parens It is used to parse the given pattern surrounded by parenthesis
Optional It parses a given pattern optionally, returning its value as maybe
Try The backtracking operator will allow us to parse ambiguous matching expressions and restart with a different pattern

 Tree description

A tree is usually represented by its nodes and operands. The nodes represent the operation while the operands as its children. For example, in the expression a+b+c, ‘+’ is the root and identifier, a is the left child, and the other ‘+’ subtree is the right child. Now let us discuss the simple definition of tree nodes for the operations ‘/’, ‘*’, ‘+’, and ‘-‘.  In arithmetic, division and multiplication have priority over subtraction and addition. Also, brackets have priority over multiplication and division. As a result, the expressions in brackets are always a subtree of division and multiplication. As a result, since we have three levels of priority, we will have three types of nodes. In addition and subtraction, expressions should have the operator + or – and two operands.

Monadic Parsing

Suppose we want to parse somehow a list of tokens. The question will be what should the operation return? The operation should return a given tree node and the rest of tokens which are not parsed. This can be modeled as a state monad where:

  • State is the list of tokens
  • Output is the resulting tree node or something else

The result of parsing is something that returns results and a list of remaining tokens to be passed or some kind of error. Do you need a code example of monadic parsing? Avail our Haskell programming assignment help.

How is Parsing done?

Implementing parsing is easy once you are familiar with the definition of nodes. When we define two functions:

  1. GetToken -One which represents computation. It returns the current token as result and doesn’t change the state of tokens (the token remains in the list of tokens)
  2. Expect – This function returns parsing error if the given token doesn’t exist in the list. It will remove that token from the list if the token exists. This function can also be defined by other names. For example, in other articles or tutorials, the experts may call it “eat function” because it eats token if it exists.

Additionally, we will have some function which always returns an error with a given message for producing an output during parsing.

We can easily do parsing from definitions of tree nodes. Take an example of factor parsing where:

  • we start by getting the current token and checking various possibilities
  • Then, we handle these possibilities
  • We output and error if nothing is satisfied

Also, we can use the expression method to easily parse an expression inside the brackets.

Do you want to learn more about parsing? Hire our programming experts today.