If you compile your Alex file without a
%wrapper declaration, then you get access to
the lowest-level API to the lexer. You must provide definitions
for the following, either in the same module or imported from
another module:
type AlexInput alexGetByte :: AlexInput -> Maybe (Word8,AlexInput) alexInputPrevChar :: AlexInput -> Char
The generated lexer is independent of the input type,
which is why you have to provide a definition for the input type
yourself. Note that the input type needs to keep track of the
previous character in the input stream;
this is used for implementing patterns with a left-context
(those that begin with ^ or
). If you
don't ever use patterns with a left-context in your lexical
specification, then you can safely forget about the previous
character in the input stream, and have
set^alexInputPrevChar return
undefined.
Alex will provide the following function:
alexScan :: AlexInput -- The current input
-> Int -- The "start code"
-> AlexReturn action -- The return value
data AlexReturn action
= AlexEOF
| AlexError
!AlexInput -- Remaining input
| AlexSkip
!AlexInput -- Remaining input
!Int -- Token length
| AlexToken
!AlexInput -- Remaining input
!Int -- Token length
action -- action valueCalling alexScan will scan a single
token from the input stream, and return a value of type
AlexReturn. The value returned is either:
AlexEOFThe end-of-file was reached.
AlexErrorA valid token could not be recognised.
AlexSkipThe matched token did not have an action associated with it.
AlexTokenA token was matched, and the action associated with it is returned.
The action is simply the value of the
expression inside {...} on the
right-hand-side of the appropriate rule in the Alex file.
Alex doesn't specify what type these expressions should have, it
simply requires that they all have the same type, or else you'll
get a type error when you try to compile the generated
lexer.
Once you have the action, it is up to
you what to do with it. The type of action
could be a function which takes the String
representation of the token and returns a value in some token
type, or it could be a continuation that takes the new input and
calls alexScan again, building a list of
tokens as it goes.
This is pretty low-level stuff; you have complete flexibility about how you use the lexer, but there might be a fair amount of support code to write before you can actually use it. For this reason, we also provide a selection of wrappers that add some common functionality to this basic scheme. Wrappers are described in the next section.
There is another entry point, which is useful if your grammar contains any predicates (see Section 3.2.2.1, “Contexts”):
alexScanUser
:: user -- predicate state
-> AlexInput -- The current input
-> Int -- The "start code"
-> AlexReturn actionThe extra argument, of some type user,
is passed to each predicate.