Writing a Parser subclass
We've seen in section Basic concepts > Building parsers that parsers are combined together to form more complex parsers. In this section, we'll go into more detail about how exactly this composition is done internally and how you could write your own Parser
subclass with its own logic. Under the hood, the Parser
class is derived into two categories of subclasses:
- Leaf classes, which don't hold a reference to other parsers. There are three of them:
LiteralParser
,RegexParser
andCutParser
. - Composition classes like
SequenceParser
,TokenParser
,PredicateParser
,ActionParser
,NonTerminalParser
etc. which, on the contrary, reference one or more subparsers.
The peg
tag's role is to parse a peg expression and to generate the corresponding Parser
instances. The expression 'a' | 'b'
is converted by the peg
tag into:
new AlternativeParser([
new LiteralParser("a"),
new LiteralParser("b")
])
&id+
is converted into:
new PredicateParser(
new RepetitionParser(
new NonTerminalParser("id"),
[1, Infinity]
),
true
)
You get the idea.
Every Parser
subclass, the standard and your custom ones, must satisfy two constraints: 1) Inheriting from Parser
, obviously, and 2) implementing an exec
method with the following signature:
exec(options: Options<Context>): Match | null;
The exec
method will be called when the parser is invoked. It must return null
on failure and a Match
object on success with the following signature:
type Match = {
from: number;
to: number;
children: any[];
}
The state of the parsing process at the time of invocation is expressed by the options
argument with info like the current position, the input string, the skipping state (on or off), the expected case sensitivity, etc. The exhaustive list is described in API > Types. Important: For performance reasons, this object is never recreated and always directly mutated.
Warning and failures must be emitted as side-effects by pushing items to options.warnings
and options.failures
. Please refer to API > Parser
.
Great. Once you wrote a custom Parser
subclass, there are basically four options for using it, depending on your needs:
- You can create an explicit instance and inject it into a peg expression as a tag argument:
const p = new MyParser();
const g = peg`0 | 1 | ${p}`;
There is also the builder approach:
const _ = data => new MyParser(data);
const g = peg`0 | 1 | ${_("foo")} | ${_("bar")}`;
- If the class relies on one specific attribute that's not a number, a string, a function, a
RegExp
or aParser
, you can make Pegase generate instances automatically by injecting that attribute directly into the peg expression and casting it into aParser
using a plugin:
peg.plugins.push({
castParser(set) {
if(set instanceof Set)
return new MyParser(set);
}
});
const g = peg`42 | ${new Set(["a", "b"])}`;
- You can define custom directives that generate instances of it:
peg.plugins.push({
directives: {
myLeafParser(_, x, y) {
return new MyLeafParser(x, y);
},
myCompParser(parser) {
return new MyCompParser(parser);
}
}
});
const p = peg`
a: b @myCompParser
b: '(' @@myLeafParser(4, 5) ')'
`;
- If your class is a singleton, you can bind its instance to an external non-terminal:
peg.plugins.push({
resolve: {
myparser: new MyParser()
}
});
const g = peg`42 | myparser`;