|
|
Original PEG Grammar (some parts of this documentation have been taken from the PEGJS project)
On the top level, the grammar consists of rules (in our example, there are five of them). Each rule has a name (e.g. number) that identifies the rule, and a parsing expression (e.g. digits:[0-9]+) that defines a pattern to match against the input text. The parsing starts at the first rule, which is also called the start rule.
A rule name is a string followed by an equality sign (“=”) and a parsing expression.
The parsing expressions of the rules are used to match the input text to the grammar. There are various types of expressions — matching characters or character classes, indicating optional parts and repetition, etc. Expressions can also contain references to other rules. See detailed description below.
Parsing Expression Types
There are several types of parsing expressions, some of them containing subexpressions and thus forming a recursive structure:
"literal" 'literal' Match exact literal string and return it. The string syntax is the same as in JavaScript.
. Match any character and return it as a string.
[characters] Match one character from a set and return it as a string. The characters in the list can be escaped in exactly the same way as in JavaScript string. The list of characters can also contain ranges (e.g. [a-z] means “all lowercase letters”).
rule Match a parsing expression of a rule recursively and return its match result.
( expression ) Match a subexpression and return its match result.
expression * Match zero or more repetitions of the expression and return their match results in an array. The matching is greedy, i.e. the parser tries to match the expression as many times as possible.
expression + Match one or more repetitions of the expression and return their match results in an array. The matching is greedy, i.e. the parser tries to match the expression as many times as possible.
expression ? Try to match the expression. If the match succeeds, return its match result, otherwise return an empty string.
& expression Try to match the expression. If the match succeeds, just return an empty string and do not advance the parser position, otherwise consider the match failed.
! expression Try to match the expression and. If the match does not succeed, just return an empty string and do not advance the parser position, otherwise consider the match failed.
expression1 expression2 ... expressionn Match a sequence of expressions and return their match results in an array.
expression1 / expression2 / ... / expressionn Try to match the first expression, if it does not succeed, try the second one, etc. Return the match result of the first successfully matched expression. If no expression matches, consider the match failed.
PEG Builder Extended Grammar
Some parts of the original PEG grammar have been changed; this is a brief list:
- a rule can be defined using the equal '=' sign or the original arrow sign '<-' (ex. program = 'a'+ or program <- 'a'+)
- a character can also be defined using these escape sequences: 1- '\\#' DecDigit+ 'd' (a decimal number with a variable number of cyphers --> \#32d or \#000032d) 2- '\\#' HexDigit+ 'x (a hexadecimal number with a variable number of cyphers --> \#AAx or \#A16x) 3- '\\u' HexDigit HexDigit HexDigit HexDigit (a hexadecimal number with 4 cyphers --> \u0020)
- the grammar can have additional information at the beginning of the file: - @Name --> the grammar name (@Name = 'My first grammar') - @By --> the grammar creator (@By = 'Lorenzi Davide') - @CaseSensitive --> is the grammar case sensitive? Default is True. (@CaseSensitive = False) - @Memoization --> should the engine use memoization? Default is True (@Memoization = False)
- annotations for rules. Annotations are defined before the rule into curly braces: - @Error --> a better error description for the following rule (@Error = 'Wrong rule definition!') - @Loose --> should the result of this rule be discarded? (@Loose) - @Col --> permits you to define at which column the rule is valid (@Col = 10-20 or @Col = 1) - @ExitIfKO --> stop the parsing if this rule fails - @ExitIfOK --> stop the parsing if this rule succeeds
Nice things:
- EOF = !. --> (End of file matchs when no more characters are found) - crlf = '\r\n' / '\r' / '\n' (End of line) - EatLine = (!crlf !EOF .)* (Loop while you don't find a carriage return or the file has finished) - Printable = [\#20x-\#7Ex\#A0x] - PrintableExt = [\#a1x-\#ffx] - AphaNumeric = [\#30x-\#39x\#41x-\#5ax\#61x-\#7ax] - WhiteSpace = [\#09x-\#0dx\#20x\#a0x] - AllLatin = [\#41x-\#5ax\#61x-\#7ax\#aax\#b5x\#bax\#c0x-\#d6x\#d8x-\#f6x\#f8x-\#024fx\#1e00x-\#1effx\#2c60x-\#2c7fx\#a720x-\#a7ffx] - AllWhiteSpace = [\#09x-\#0dx\#20x\#85x\#a0x\#1680x\#180ex\#2000x-\#200ax\#2028x\#2029x\#202fx\#205fx\#3000x] - AllNewLine = [\#0ax\#0dx\#2028x\#2029x] - EuroSign = [\#20acx] - PrintableAnsi = [\#20x-\#7ex\#a0x-\#ffx] - ControlCodes = [\#00x-\#1fx\#7fx-\#9fx] - BasicLatin = [\#00x-\#7fx]
Grammar definition (in PEG form)
|
|
|