This short presentation draws on the computational complexity of Perl 5 regexes, the experimental fetures introduced to P5 later on and the pattern expression grammars in Perl 6. It shows some examples of how PEGs can be used for data exploratory parsing.
4. What drives complexity of Perl 5 regex?
? Basic regular expressions (describing regular
languages) are easy to match: O(|T|) /
O(|r|*|T|)
? Backreferences:
¨C Exponential time O(2^|T|)
¨C NP-complete problem (3SAT reduction)
¨C implicate the use of backtracking algorithms
5. ¡
? Look-behind assertions add even more
complexity (input is not consumed): O(2|T|+|r|)
? Experimental features of code assertions
introduced undecidability:
a=~/<?{do 1 while 2;}>/
6. Apocalypse 5
Perl 5 REGEXes were:
? too compact and 'cute¡®
? had too much reliance on too few
metacharacters
? little support for named captures
? little support for grammars
? poor integration with the 'real' language
7. What¡¯s new in Perl 6 rules?
? Named captures (backported to 5.10)
? Simpler notation for non-capturing groups
? White spaces ignored unless backspaced
? Simplified code assertions (experimental in
5.18)
? New character classes and set operations on
char classes
8. Now the big things
? Rules are part of the ¡®real¡¯ language (same
lexer/parser)
? Support for Parsing Expression Grammars (PEGs)
? Backtracking/matching control
? Code assertions can fail or succeed unlike in Perl
5
$perl -e "'abrakadabra'=~/(ab)(?{0}).*(?{print 'OK'})/¡°
OK
$perl6 -e "'abrakadabra'~~/(ab)<?{0}>.*<?{say 'OK'}>/¡°
9. PEGs
? Similar to CFG but unambiguous in terms of parse
trees
? CFG rules explain how to produce words ¨C PEGs
explain how to parse them:
¨C CGF for {an bn : n >=1}:
? S -> 'a' S 'b¡®
? S -> ¦Å
¨C PEG
? S ¡û 'a' S? 'b'
? PEG: negative/positive look-ahead assertions
10. ¡
? With PEGs it¡¯s possible to describe some non-
context-free grammars {an bn cn : n >=1}
? Recognizing PEG expressions can be done in
linear time if there¡¯s enough memory
? Rakudo comes with a JSON parser
implementation written in Perl 6 rules!
? Naturaly, Perl6 grammar is also expressed in
PEG
11. Exploratory parsing
grammar BookGrammar {
rule TOP {<book>+}
regex book {<author><whitespace><title> | <title><whitespace><author>}
regex author { (<name>|<initial>) [<whitespace>[<name>|<initial>]]?
<whitespace> <surename> }
rule name {(<word>)<?{ %firstNames{$0}:exists}>}
rule surename {<word>}
rule title { '"'<word>[<whitespace><word>]*'"'}
token word { <alpha>+ }
token initial { <alpha>.? }
token whitespace {s+}
}
12. ¡
sub MAIN() {
my $match = BookGrammar.parse('Joseph Conrad "Lord Jim"');
say $match;
}