BNFC-2.9.6.2: A compiler front-end generator.
MaintainerKait Lam
Safe HaskellSafe-Inferred
LanguageHaskell2010

BNFC.Backend.TreeSitter.MatchesEmpty

Description

This module identifies and transforms rules which match the empty string, as required by constraints of Treesitter.

Treesitter requires that rules do not match the empty string. Although this is not made explicit in their documentation, rules which match empty will be thoroughly rejected by the tree-sitter compiler.

For example, this Treesitter grammar is not allowed because $.listItem could match the empty string.

list: $ => seq("[", $.listItem, "]"),
listItem: $ => choice(
  seq(),
  "item"
),

Instead, Treesitter wants empty matches to be moved to use-sites of that rule. The above grammar would be rewritten as:

list: $ => seq("[", optional($.listItem), "]"),
listItem: $ => choice(
  choice(),
  "item"
),

Unfortunately, the style Treesitter needs is quite incompatible with LBNF. LBNF has no way to express "choice" occuring within the right hand side of a rule, which forces any choice (including potential optionality) to happen at the top-level of a rule. This is in direct conflict with what Treesitter expects.

This modules bridges the gap by transforming LBNF's rules using process outlined above. This happens in two steps: first, we compute which rules could match empty by using a fixpoint algorithm, then, we transform the rules by eliminating empty matches from all rules and wrapping non-terminals in optional if their rule could match empty. BNFC's BNFC.CF types have no notion of "optional" within the RHS, so this module also introduces OptSym to represent this.

Of course, this transformation affects the parse tree for certain strings. Users of BNFC who want to generate Treesitter grammars should be aware of this change.

For users of this library, the main functions of interest are in the Fixpoint and transformations section.

Synopsis

Basic types

type Sym = Either Cat String Source #

A symbol which is either a non-terminal (Cat) or terminal token name (String). A list of these Syms is a sentential form, SentForm.

newtype KnownEmpty Source #

Set of Sym which are known to match the empty string.

Constructors

KnownEmpty (Set Sym) 

Instances

Instances details
Show KnownEmpty Source # 
Instance details

Defined in BNFC.Backend.TreeSitter.MatchesEmpty

Methods

showsPrec :: Int -> KnownEmpty -> ShowS

show :: KnownEmpty -> String

showList :: [KnownEmpty] -> ShowS

Eq KnownEmpty Source # 
Instance details

Defined in BNFC.Backend.TreeSitter.MatchesEmpty

Methods

(==) :: KnownEmpty -> KnownEmpty -> Bool

(/=) :: KnownEmpty -> KnownEmpty -> Bool

isKnownEmpty :: Sym -> KnownEmpty -> Bool Source #

Returns whether the given symbol matches the empty string, according to the given known empty set.

data OptSym Source #

Represents a Sym which might be wrapped in a optional(...) function in the produced Treesitter grammar.

Constructors

Optional Sym

A Sym which is wrapped in optional([SYM]), indicating that it should match [SYM] or the empty string.

NonOptional Sym

A plain Sym which matches only the Sym itself.

Instances

Instances details
Show OptSym Source # 
Instance details

Defined in BNFC.Backend.TreeSitter.MatchesEmpty

Methods

showsPrec :: Int -> OptSym -> ShowS

show :: OptSym -> String

showList :: [OptSym] -> ShowS

Eq OptSym Source # 
Instance details

Defined in BNFC.Backend.TreeSitter.MatchesEmpty

Methods

(==) :: OptSym -> OptSym -> Bool

(/=) :: OptSym -> OptSym -> Bool

type OptSentForm = [OptSym] Source #

A sentential form where each symbol may be wrapped in an optional function. Analagous to SentForm, but containing OptSym instead of Sym.

"Matches empty" type

data MatchesEmpty a Source #

Represents whether the wrapped value matches the empty string, or whether it is known to be non-empty.

Because this analysis is done on context-free grammars, the analysis is precise. A value of MatchesEmpty will accept the empty string, and a value of NonEmpty will not. There is no uncertainty in this analysis.

Constructors

MatchesEmpty a

The contained value accepts the empty string.

NonEmpty a

The contained value does not accept the empty string.

Instances

Instances details
Show a => Show (MatchesEmpty a) Source # 
Instance details

Defined in BNFC.Backend.TreeSitter.MatchesEmpty

Methods

showsPrec :: Int -> MatchesEmpty a -> ShowS

show :: MatchesEmpty a -> String

showList :: [MatchesEmpty a] -> ShowS

Eq a => Eq (MatchesEmpty a) Source # 
Instance details

Defined in BNFC.Backend.TreeSitter.MatchesEmpty

Methods

(==) :: MatchesEmpty a -> MatchesEmpty a -> Bool

(/=) :: MatchesEmpty a -> MatchesEmpty a -> Bool

Sequential operators

seqMatchesEmpty :: Semigroup a => MatchesEmpty a -> MatchesEmpty a -> MatchesEmpty a Source #

Combines the two values in sequence. Returns MatchesEmpty if both values are MatchesEmpty, otherwise returns NonEmpty. In all cases, the inner values are joined using the semigroup operation.

seqListMatchesEmpty :: Monoid a => [MatchesEmpty a] -> MatchesEmpty a Source #

Combines the list of values in sequence (i.e., seq(x1, ..., xn)), returning MatchesEmpty if all are MatchesEmpty, otherwise NonEmpty. Inner values are joined using the semigroup operation.

Alternation operators

choiceMatchesEmpty :: Semigroup a => MatchesEmpty a -> MatchesEmpty a -> MatchesEmpty a Source #

Combines the two values as a parallel choice. Returns NonEmpty if both values are NonEmpty, otherwise returns MatchesEmpty. In all cases, the inner values are joined using the semigroup operation.

choiceListMatchesEmpty :: Monoid a => [MatchesEmpty a] -> MatchesEmpty a Source #

Combines the list of values in choice (i.e., choice(x1, ..., xn)), returning NonEmpty if all are NonEmpty, otherwise MatchesEmpty. Inner values are joined using the semigroup operation.

Analysis of non-terminals

possiblyEmptySym :: KnownEmpty -> Sym -> OptSym Source #

Determines whether the given symbol can match empty, according to the given known empty set. If it can match empty, the symbol is returned as Optional to indicate that uses of the symbol should match empty.

TODO: This does not yet handle tokens (terminals) which might be empty. At the moment, all terminals are assumed to be non-empty.

possiblyEmptyRule :: KnownEmpty -> SentForm -> MatchesEmpty [OptSentForm] Source #

Determines whether the given sentential form could match empty.

The returned list is a choice list of OptSentForm, with Optional applied to symbols which are within the known empty set. When combined using choice, the returned list is equivalent to the original rule, except that the returned list has empty matches removed. If the rule previously matched empty, this is encoded as the MatchesEmpty variant.

Implementation Detail: blah

possiblyEmptyCat :: KnownEmpty -> (Cat, [Rule]) -> MatchesEmpty [OptSentForm] Source #

Determines whether the given non-terminal category with the given production rules could match empty.

The returned list is a choice list of OptSentForm, with Optional applied to symbols which are within the known empty set. When combined using choice, the returned list is equivalent to the original rules, except that the returned list has empty matches removed. If the category previously matched empty, this is encoded as the MatchesEmpty variant.

possiblyEmptyCats :: [(Cat, [Rule])] -> KnownEmpty -> KnownEmpty Source #

Updates the set of known empty symbols according to the given grammar. Returns the new set, which is made up of the previous set unioned with any newly-discovered empty matching symbols.

This is one step of the fixpoint computation in fixPointKnownEmpty.

Fixpoint and transformations

For users of this module, these are the main functions of interest.

fixPointKnownEmpty :: [(Cat, [Rule])] -> KnownEmpty Source #

Computes the complete set of symbols which are known to match empty, using the given non-terminal production rules.

This should be given the list of parsable grammar rules, e.g., from 'BNFC.CF.ruleGroups.

transformEmptyMatches :: KnownEmpty -> SentForm -> [OptSentForm] Source #

Transforms the given sentence such that the returned sentential form does not match the empty string, and contains Optional terms where needed.

The returned list is a choice list which is equivalent to the given sentential form, but for the (potential) subtraction of empty matches.

Optional is inserted around symbols which previously matched the empty string (according to the given KnownEmpty). This compensates for transformEmptyMatches being applied to other rules of the grammar.

After this transformation is applied to all rules of the grammar, the grammar should accept an identical language. However, the exact nodes which match certain strings might change.