PEG Builder

Introduction | Engine | Grammar | Editor

Original PEG Grammar (some parts of this documentation have been taken from the PEGJS project)

On the top level, the grammar consists of rules (in our example, there are five of them). Each rule has a name (e.g. number) that identifies the rule, and a parsing expression (e.g. digits:[0-9]+) that defines a pattern to match against the input text. The parsing starts at the first rule, which is also called the start rule.

A rule name is a string followed by an equality sign (“=”) and a parsing expression.

The parsing expressions of the rules are used to match the input text to the grammar. There are various types of expressions — matching characters or character classes, indicating optional parts and repetition, etc. Expressions can also contain references to other rules. See detailed description below.

Parsing Expression Types

There are several types of parsing expressions, some of them containing subexpressions and thus forming a recursive structure:

"literal"
'literal'
Match exact literal string and return it. The string syntax is the same as in JavaScript.

.
Match any character and return it as a string.

[characters]
Match one character from a set and return it as a string. The characters in the list can be escaped in exactly the same way as in JavaScript string. The list of characters can also contain ranges (e.g. [a-z] means “all lowercase letters”).

rule
Match a parsing expression of a rule recursively and return its match result.

( expression )
Match a subexpression and return its match result.

expression *
Match zero or more repetitions of the expression and return their match results in an array. The matching is greedy, i.e. the parser tries to match the expression as many times as possible.

expression +
Match one or more repetitions of the expression and return their match results in an array. The matching is greedy, i.e. the parser tries to match the expression as many times as possible.

expression ?
Try to match the expression. If the match succeeds, return its match result, otherwise return an empty string.

& expression
Try to match the expression. If the match succeeds, just return an empty string and do not advance the parser position, otherwise consider the match failed.

! expression
Try to match the expression and. If the match does not succeed, just return an empty string and do not advance the parser position, otherwise consider the match failed.

expression1 expression2 ... expressionn
Match a sequence of expressions and return their match results in an array.

expression1 / expression2 / ... / expressionn
Try to match the first expression, if it does not succeed, try the second one, etc. Return the match result of the first successfully matched expression. If no expression matches, consider the match failed.

PEG Builder Extended Grammar

Some parts of the original PEG grammar have been changed; this is a brief list:

- a rule can be defined using the equal '=' sign or the original arrow sign '<-' (ex. program = 'a'+ or program <- 'a'+)

- a character can also be defined using these escape sequences:
1- '\\#' DecDigit+ 'd' (a decimal number with a variable number of cyphers --> \#32d or \#000032d)
2- '\\#' HexDigit+ 'x (a hexadecimal number with a variable number of cyphers --> \#AAx or \#A16x)
3- '\\u' HexDigit HexDigit HexDigit HexDigit (a hexadecimal number with 4 cyphers --> \u0020)

- the grammar can have additional information at the beginning of the file:
- @Name --> the grammar name (@Name = 'My first grammar')
- @By --> the grammar creator (@By = 'Lorenzi Davide')
- @CaseSensitive --> is the grammar case sensitive? Default is True. (@CaseSensitive = False)
- @Memoization --> should the engine use memoization? Default is True (@Memoization = False)

- annotations for rules. Annotations are defined before the rule into curly braces:
- @Error --> a better error description for the following rule (@Error = 'Wrong rule definition!')
- @Loose --> should the result of this rule be discarded? (@Loose)
- @Col --> permits you to define at which column the rule is valid (@Col = 10-20 or @Col = 1)
- @ExitIfKO --> stop the parsing if this rule fails
- @ExitIfOK --> stop the parsing if this rule succeeds

Nice things:

- EOF = !. --> (End of file matchs when no more characters are found)
- crlf = '\r\n' / '\r' / '\n' (End of line)
- EatLine = (!crlf !EOF .)* (Loop while you don't find a carriage return or the file has finished)
- Printable = [\#20x-\#7Ex\#A0x]
- PrintableExt = [\#a1x-\#ffx]
- AphaNumeric = [\#30x-\#39x\#41x-\#5ax\#61x-\#7ax]
- WhiteSpace = [\#09x-\#0dx\#20x\#a0x]
- AllLatin = [\#41x-\#5ax\#61x-\#7ax\#aax\#b5x\#bax\#c0x-\#d6x\#d8x-\#f6x\#f8x-\#024fx\#1e00x-\#1effx\#2c60x-\#2c7fx\#a720x-\#a7ffx]
- AllWhiteSpace = [\#09x-\#0dx\#20x\#85x\#a0x\#1680x\#180ex\#2000x-\#200ax\#2028x\#2029x\#202fx\#205fx\#3000x]
- AllNewLine = [\#0ax\#0dx\#2028x\#2029x]
- EuroSign = [\#20acx]
- PrintableAnsi = [\#20x-\#7ex\#a0x-\#ffx]
- ControlCodes = [\#00x-\#1fx\#7fx-\#9fx]
- BasicLatin = [\#00x-\#7fx]

Grammar definition (in PEG form)

Program = Spacing GlobalAttr* Definition+ EndOfFile
GlobalAttr = AttrName / AttrBy / AttrCaseSens / AttrMemo
AttrName = '@Name' Spacing LEFTARROW Spacing Literal
AttrBy = '@By' Spacing LEFTARROW Spacing Literal
AttrCaseSens = '@CaseSensitive' Spacing LEFTARROW Spacing (TRUE / FALSE)
AttrMemo = '@Memoization' Spacing LEFTARROW Spacing (TRUE / FALSE)
Definition = Annotations? IdentifierName LEFTARROW Expression

Annotations = '{' (Spacing Annotation)* '}' Spacing
Annotation = ErrorDef / LooseDef / ColDef / ExitKODef / ExitOKDef
ErrorDef = '@Error' Spacing '=' Spacing Literal
LooseDef = '@Loose' Spacing
ColDef = '@Col' Spacing '=' Spacing NumericRange # valido solo da colonna a colonna
ExitKODef = '@ExitIfKO' Spacing
ExitOKDef = '@ExitIfOK' Spacing
NumericRange = IntLiteral Spacing '-' Spacing IntLiteral Spacing / IntLiteral Spacing
IntLiteral = [0-9]+

Expression = Sequence (SLASH Sequence)*
Sequence = Prefix+
Prefix = (AND / NOT)? Suffix
Suffix = Primary (QUESTION / STAR / PLUS)?
Primary = IdentifierCall !LEFTARROW
/ OPEN Expression CLOSE
/ Literal / Class / DOT

# divided only for highlighting in the builder
IdentifierName = IdentStart IdentCont* Spacing
IdentifierCall = IdentStart IdentCont* Spacing
IdentStart = [a-zA-Z_]
IdentCont = IdentStart / [0-9]
Literal = ['] (!['\n\r] Char)* ['] Spacing
/ ["] (!["\n\r] Char)* ["] Spacing
Class = '[' (!']' Range)+ ']' Spacing
Range = Char '-' Char / Char
Char = '\\' [nrt'"\[\]\\]
/ '\\#' DecDigit+ 'd' # dav
/ '\\#' HexDigit+ 'x' # dav
/ '\\u' HexDigit HexDigit HexDigit HexDigit # dav
/ !'\\' .

DecDigit = [0-9]
HexDigit = [0-9A-Fa-f]
TRUE = ('True' / 'true') Spacing
FALSE = ('False' / 'false') Spacing
LEFTARROW = ('=' / '<-') Spacing
SLASH = '/' Spacing
AND = '&' Spacing
NOT = '!' Spacing
QUESTION = '?' Spacing
STAR = '*' Spacing
PLUS = '+' Spacing
OPEN = '(' Spacing
CLOSE = ')' Spacing
DOT = '.' Spacing
Spacing = (Space / Comment)*
Comment = '#' (!EndOfLine .)* EndOfLine
Space = ' ' / '\t' / EndOfLine
EndOfLine = '\r\n' / '\n' / '\r'
EndOfFile = !.

Editor >>

Home | Products | Contacts | Site Map | Privacy

Software

• Dynamic HTML Editor
• Unicode Controls for VB6

Support

• Support

Free Stuff

• 9Desks
• Performance Monitor
• Window Ruler
• Ini Translation Utility
• DC3 Compiler
• Function Analyzer
• Hexbox
• DomainTest
• PEG Builder
• Rapid File Get
• Aspire One Temp Monitor

dynamic html editor, html editor, wysiwyg editor, mp3 changer, rapid file get, enhanced virtual desktop, dav clock, dav performance monitor, lorenzi davide, web design, unicode controls for vb6, unicode activex