Skip to content

Home

Pegase is a PEG parser generator for JavaScript and TypeScript. It's:

  • Inline, meaning grammars are directly expressed as tagged template literals. No generation step, no CLI. Pegase works in symbiosis with JS.
  • Fast. Pegase is heavily optimized to be extremely fast while providing an extensive range of features.
  • Complete. Pegase has everything you will ever need: an elegant grammar syntax with lots of flexibility, semantic actions, parametrized rules, support for native regexps, error recovery, warnings, integrated AST generation and visitors, cut operator, back references, grammar merging, and a lot more.
  • Lightweight. Pegase is a zero-dependency package, and weights around 9kB gzipped.
  • Intuitive, in that it lets you express complex processes in very simple ways. You will never feel lost.
  • Extensible: You can define your own Parser subclasses, add plugins, write custom directives, etc.

⚠️ This library is still under development. This is a pre-release but some functionalities might still change.


Quick start

First, add Pegase as a dependency:

npm install pegase or yarn add pegase

Next, import the template literal tag that will become your new best friend and there you go, ready to write your first peg expression.

import peg from "pegase";

const parser = peg`your peg expression`;

What about a parser that recognizes a binary digit ? That's a simple alternative:

const bit = peg`0 | 1`;

Ok, bit is now a Parser instance, which has 4 methods : parse, test, value and children. Let's take a look at test. It takes a string input and returns true or false (whether the string conforms to the pattern or not).

if (bit.test("1"))
  console.log("It's a match!");

What about an array of bits like [0, 1, 1, 0, 1] ?

const bitArray = peg`'[' (0 | 1) % ',' ']'`;

The % operator can be read as "separated by". Let's test it:

if (bitArray.test(" [ 0,1,    1 ,0 , 1 ]  "))
  console.log("It's a match!");

As you might have spotted, whitespaces are handled automatically by default (it can be changed). The way this works is pretty simple: whitespace characters are parsed and discarded before every terminal parser (like '[', 1, etc.). This process is called skipping. By default, every parser also adds an implicit "end of input" symbol ($) at the end of the peg expression, which is a terminal, thus the trailing space is skipped too and the whole string matches.

Good, but so far, a RegExp could have done the job. Things get interesting when we add in non-terminals. A non-terminal is an identifier that refers to a more complex peg expression which will be invoked every time the identifier is used. You can think of non-terminals as variables whose value is a parser, initialized in what we call rules. This allows for recursive patterns. Let's say we want to match possibly infinitely-nested bit arrays:

const nestedBitArray = peg`  
  bitArray: '[' (bit | bitArray) % ',' ']'
  bit: 0 | 1
`;

We have two rules: bitArray and bit. A collection of rules is called a grammar. The generated parser, nestedBitArray, always points to the topmost rule, bitArray in this case.

Testing it:

nestedBitArray.test("[[0]");          // false
nestedBitArray.test("[ [1, 0], 1] "); // true
nestedBitArray.test(" [0, [[0] ]]");  // true

If we already defined bit as a JS variable, we're not obligated to redefine it as a rule. We can simply inject it as a tag argument:

const bit = peg`0 | 1`;

const nestedBitArray = peg`  
  bitArray: '[' (${bit} | bitArray) % ',' ']'
`;

Okay, the test method is fun but what if you want to do something more elaborated like collecting values, reading warnings or parse failures? The parse method is what you're asking for. It returns a result object containing all the info you might be interested in after a parsing. In fact, all other Parser methods (test, value and children) are wrappers around parse.

const result = nestedBitArray.parse("[[0]");
if(!result.success)
  console.log(result.log());

This will output:

(1:5) Failure: Expected "," or "]"

> 1 | [[0]
    |     ^

What we are going to do next is collecting the bits we matched in an array. Every parser and subparser has the ability to emit values on success. These values are called children and can be processed in parent parsers, which in turn emit children, etc. You can think of children as Pegase's version of synthesized attributes, values that bubble from bottom to top.

Back to the grammar. By writing "0" instead of 0 or '0', it will emit the matched substring as a single child (same for 1):

const bit = peg`"0" | "1"`;
bit.parse("1").children; // ["1"]

Or directly:

bit.children("1"); // ["1"]

children are automatically concatenated in case of sequence and repetition:

const bitArray = peg`'[' ("0" | "1") % ',' ']'`;
bitArray.children("[0, 1, 1, 0, 1]"); // ["0", "1", "1", "0", "1"]

You can wrap any peg expression in semantic actions by inserting functions on the right-hand side of the expression. These functions will be called when the wrapped expression successfully matches. Actions can, among many other things, read their subparser's children, process them and emit new ones. Let's wrap our entire expression in an action and console.log the children from there:

import peg, { $children } from "pegase";

const bitArray = peg`
  '[' ("0" | "1") % ',' ']' ${() => console.log($children())}
`;

bitArray.parse("[0, 1, 1, 0, 1]"); // console.log: ["0", "1", "1", "0", "1"]

If we return a value in our semantic action, it will be emitted as a single child in replacement of the previous children. Let's use this to sum our bits:

const bitArray = peg`
  '[' ("0" | "1") % ',' ']' ${() => $children().reduce((a, b) => a + Number(b), 0)}
`;

bitArray.children("[0, 1, 1, 0, 1]"); // [3]

When a parser emits a single child, that child is said to be the value of the parser:

bitArray.value("[0, 1, 1, 0, 1]"); // 3

Some behaviors are so commonly used that they are abstracted away in reusable bricks called directives. Similarly to semantic actions, un directive wraps a parser and produces another parser. Here is an example of a standard directive, @reverse, that... well, reverses the children:

const bitArray = peg`'[' ("0" | "1") % ',' ']' @reverse`;
bitArray.children("[0, 1, 1, 0, 1]"); // ["1", "0", "1", "1", "0"]

Last challenge: what about matching an array of 0 or an array of 1, but not the two mixed up? You could write:

const bitArray = peg`'[' (0 % ',' | 1 % ',') ']'`;

Or you could factor the whole expression by using parametrized rules:

const bitArray = peg`
  bitArray: '[' seq(0) | seq(1) ']'
  seq(item): item % ','
`;

Let's try it out:

const result = bitArray.parse("[1, 0, 1]");
if(!result.success)
  console.log(result.log());
(1:5) Failure: Expected "1"

> 1 | [1, 0, 1]
    |     ^

Try-it out

You can try everything out while reading this website by accessing the JS console tab. The peg tag and hooks will be directly available in your namespace. All other named exports from pegase are available as properties of _. Have fun!

Console demo

pegase

Back to top