| Maintainer | Kait Lam |
|---|---|
| Safe Haskell | Safe-Inferred |
| Language | Haskell2010 |
BNFC.Backend.TreeSitter.MatchesEmpty
Description
This module identifies and transforms rules which match the empty string, as required by constraints of Treesitter.
Treesitter requires that rules do not match the empty string. Although this is not made explicit in their documentation, rules which match empty will be thoroughly rejected by the tree-sitter compiler.
For example, this Treesitter grammar is not allowed because $.listItem could match
the empty string.
list: $ => seq("[", $.listItem, "]"),
listItem: $ => choice(
seq(),
"item"
),Instead, Treesitter wants empty matches to be moved to use-sites of that rule. The above grammar would be rewritten as:
list: $ => seq("[", optional($.listItem), "]"),
listItem: $ => choice(
choice(),
"item"
),Unfortunately, the style Treesitter needs is quite incompatible with LBNF. LBNF has no way to express "choice" occuring within the right hand side of a rule, which forces any choice (including potential optionality) to happen at the top-level of a rule. This is in direct conflict with what Treesitter expects.
This modules bridges the gap by transforming LBNF's rules using process
outlined above. This happens in two steps: first, we compute which rules could
match empty by using a fixpoint algorithm, then, we transform the rules by
eliminating empty matches from all rules and wrapping non-terminals in
optional if their rule could match empty. BNFC's BNFC.CF types have no
notion of "optional" within the RHS, so this module also introduces OptSym to
represent this.
Of course, this transformation affects the parse tree for certain strings. Users of BNFC who want to generate Treesitter grammars should be aware of this change.
For users of this library, the main functions of interest are in the Fixpoint and transformations section.
Synopsis
- type Sym = Either Cat String
- newtype KnownEmpty = KnownEmpty (Set Sym)
- isKnownEmpty :: Sym -> KnownEmpty -> Bool
- knownEmptySet :: KnownEmpty -> Set Sym
- data OptSym
- type OptSentForm = [OptSym]
- data MatchesEmpty a
- = MatchesEmpty a
- | NonEmpty a
- matchesEmpty :: MatchesEmpty a -> Bool
- unMatchesEmpty :: MatchesEmpty a -> a
- seqMatchesEmpty :: Semigroup a => MatchesEmpty a -> MatchesEmpty a -> MatchesEmpty a
- seqListMatchesEmpty :: Monoid a => [MatchesEmpty a] -> MatchesEmpty a
- choiceMatchesEmpty :: Semigroup a => MatchesEmpty a -> MatchesEmpty a -> MatchesEmpty a
- choiceListMatchesEmpty :: Monoid a => [MatchesEmpty a] -> MatchesEmpty a
- possiblyEmptySym :: KnownEmpty -> Sym -> OptSym
- possiblyEmptyRule :: KnownEmpty -> SentForm -> MatchesEmpty [OptSentForm]
- possiblyEmptyCat :: KnownEmpty -> (Cat, [Rule]) -> MatchesEmpty [OptSentForm]
- possiblyEmptyCats :: [(Cat, [Rule])] -> KnownEmpty -> KnownEmpty
- fixPointKnownEmpty :: [(Cat, [Rule])] -> KnownEmpty
- transformEmptyMatches :: KnownEmpty -> SentForm -> [OptSentForm]
Basic types
newtype KnownEmpty Source #
Set of Sym which are known to match the empty string.
Constructors
| KnownEmpty (Set Sym) |
Instances
| Show KnownEmpty Source # | |
Defined in BNFC.Backend.TreeSitter.MatchesEmpty Methods showsPrec :: Int -> KnownEmpty -> ShowS show :: KnownEmpty -> String showList :: [KnownEmpty] -> ShowS | |
| Eq KnownEmpty Source # | |
Defined in BNFC.Backend.TreeSitter.MatchesEmpty | |
isKnownEmpty :: Sym -> KnownEmpty -> Bool Source #
Returns whether the given symbol matches the empty string, according to the given known empty set.
knownEmptySet :: KnownEmpty -> Set Sym Source #
Represents a Sym which might be wrapped in a optional(...) function
in the produced Treesitter grammar.
Constructors
| Optional Sym | A |
| NonOptional Sym |
type OptSentForm = [OptSym] Source #
"Matches empty" type
data MatchesEmpty a Source #
Represents whether the wrapped value matches the empty string, or whether it is known to be non-empty.
Because this analysis is done on context-free grammars, the analysis is
precise. A value of MatchesEmpty will accept the empty string, and a
value of NonEmpty will not. There is no uncertainty in this analysis.
Constructors
| MatchesEmpty a | The contained value accepts the empty string. |
| NonEmpty a | The contained value does not accept the empty string. |
Instances
| Show a => Show (MatchesEmpty a) Source # | |
Defined in BNFC.Backend.TreeSitter.MatchesEmpty Methods showsPrec :: Int -> MatchesEmpty a -> ShowS show :: MatchesEmpty a -> String showList :: [MatchesEmpty a] -> ShowS | |
| Eq a => Eq (MatchesEmpty a) Source # | |
Defined in BNFC.Backend.TreeSitter.MatchesEmpty Methods (==) :: MatchesEmpty a -> MatchesEmpty a -> Bool (/=) :: MatchesEmpty a -> MatchesEmpty a -> Bool | |
matchesEmpty :: MatchesEmpty a -> Bool Source #
unMatchesEmpty :: MatchesEmpty a -> a Source #
Sequential operators
seqMatchesEmpty :: Semigroup a => MatchesEmpty a -> MatchesEmpty a -> MatchesEmpty a Source #
Combines the two values in sequence. Returns MatchesEmpty if both values
are MatchesEmpty, otherwise returns NonEmpty. In all cases, the inner
values are joined using the semigroup operation.
seqListMatchesEmpty :: Monoid a => [MatchesEmpty a] -> MatchesEmpty a Source #
Combines the list of values in sequence (i.e., seq(x1, ..., xn)), returning
MatchesEmpty if all are MatchesEmpty, otherwise NonEmpty. Inner values
are joined using the semigroup operation.
Alternation operators
choiceMatchesEmpty :: Semigroup a => MatchesEmpty a -> MatchesEmpty a -> MatchesEmpty a Source #
Combines the two values as a parallel choice. Returns NonEmpty if
both values are NonEmpty, otherwise returns MatchesEmpty. In all
cases, the inner values are joined using the semigroup operation.
choiceListMatchesEmpty :: Monoid a => [MatchesEmpty a] -> MatchesEmpty a Source #
Combines the list of values in choice (i.e., choice(x1, ..., xn)), returning
NonEmpty if all are NonEmpty, otherwise MatchesEmpty. Inner values
are joined using the semigroup operation.
Analysis of non-terminals
possiblyEmptySym :: KnownEmpty -> Sym -> OptSym Source #
Determines whether the given symbol can match empty, according to the
given known empty set. If it can match empty, the symbol is returned as
Optional to indicate that uses of the symbol should match empty.
TODO: This does not yet handle tokens (terminals) which might be empty. At the moment, all terminals are assumed to be non-empty.
possiblyEmptyRule :: KnownEmpty -> SentForm -> MatchesEmpty [OptSentForm] Source #
Determines whether the given sentential form could match empty.
The returned list is a choice list of OptSentForm, with Optional
applied to symbols which are within the known empty set. When combined using
choice, the returned list is equivalent to the original rule, except that
the returned list has empty matches removed. If the rule previously matched
empty, this is encoded as the MatchesEmpty variant.
Implementation Detail: blah
possiblyEmptyCat :: KnownEmpty -> (Cat, [Rule]) -> MatchesEmpty [OptSentForm] Source #
Determines whether the given non-terminal category with the given production rules could match empty.
The returned list is a choice list of OptSentForm, with Optional
applied to symbols which are within the known empty set. When combined using
choice, the returned list is equivalent to the original rules, except that
the returned list has empty matches removed. If the category previously
matched empty, this is encoded as the MatchesEmpty variant.
possiblyEmptyCats :: [(Cat, [Rule])] -> KnownEmpty -> KnownEmpty Source #
Updates the set of known empty symbols according to the given grammar. Returns the new set, which is made up of the previous set unioned with any newly-discovered empty matching symbols.
This is one step of the fixpoint computation in fixPointKnownEmpty.
Fixpoint and transformations
For users of this module, these are the main functions of interest.
fixPointKnownEmpty :: [(Cat, [Rule])] -> KnownEmpty Source #
Computes the complete set of symbols which are known to match empty, using the given non-terminal production rules.
This should be given the list of parsable grammar rules, e.g., from 'BNFC.CF.ruleGroups.
transformEmptyMatches :: KnownEmpty -> SentForm -> [OptSentForm] Source #
Transforms the given sentence such that the returned sentential form does
not match the empty string, and contains Optional terms where needed.
The returned list is a choice list which is equivalent to the given sentential form, but for the (potential) subtraction of empty matches.
Optional is inserted around symbols which previously matched the empty
string (according to the given KnownEmpty). This compensates for
transformEmptyMatches being applied to other rules of the grammar.
After this transformation is applied to all rules of the grammar, the grammar should accept an identical language. However, the exact nodes which match certain strings might change.