狠狠撸

Perl 6 rules
Radek Kotowicz
2nd Polish Perl Workshop,
Poznań, May 2014

What drives complexity of Perl 5 regex?
? Basic regular expressions (describing regular
languages) are easy to match: O(|T|) /
O(|r|*|T|)
? Backreferences:
– Exponential time O(2^|T|)
– NP-complete problem (3SAT reduction)
– implicate the use of backtracking algorithms

…
? Look-behind assertions add even more
complexity (input is not consumed): O(2|T|+|r|)
? Experimental features of code assertions
introduced undecidability:
a=~/<?{do 1 while 2;}>/

Apocalypse 5
Perl 5 REGEXes were:
? too compact and 'cute‘
? had too much reliance on too few
metacharacters
? little support for named captures
? little support for grammars
? poor integration with the 'real' language

What’s new in Perl 6 rules?
? Named captures (backported to 5.10)
? Simpler notation for non-capturing groups
? White spaces ignored unless backspaced
? Simplified code assertions (experimental in
5.18)
? New character classes and set operations on
char classes

Now the big things
? Rules are part of the ‘real’ language (same
lexer/parser)
? Support for Parsing Expression Grammars (PEGs)
? Backtracking/matching control
? Code assertions can fail or succeed unlike in Perl
5
$perl -e "'abrakadabra'=~/(ab)(?{0}).*(?{print 'OK'})/“
OK
$perl6 -e "'abrakadabra'~~/(ab)<?{0}>.*<?{say 'OK'}>/“

PEGs
? Similar to CFG but unambiguous in terms of parse
trees
? CFG rules explain how to produce words – PEGs
explain how to parse them:
– CGF for {an bn : n >=1}:
? S -> 'a' S 'b‘
? S -> ε
– PEG
? S ← 'a' S? 'b'
? PEG: negative/positive look-ahead assertions

…
? With PEGs it’s possible to describe some non-
context-free grammars {an bn cn : n >=1}
? Recognizing PEG expressions can be done in
linear time if there’s enough memory
? Rakudo comes with a JSON parser
implementation written in Perl 6 rules!
? Naturaly, Perl6 grammar is also expressed in
PEG

Exploratory parsing
grammar BookGrammar {
rule TOP {<book>+}
regex book {<author><whitespace><title> | <title><whitespace><author>}
regex author { (<name>|<initial>) [<whitespace>[<name>|<initial>]]?
<whitespace> <surename> }
rule name {(<word>)<?{ %firstNames{$0}:exists}>}
rule surename {<word>}
rule title { '"'<word>[<whitespace><word>]*'"'}
token word { <alpha>+ }
token initial { <alpha>.? }
token whitespace {s+}
}

…
sub MAIN() {
my $match = BookGrammar.parse('Joseph Conrad "Lord Jim"');
say $match;
}

…
Joseph Conrad "Lord Jim""
book => "Joseph Conrad "Lord Jim""
author => "Joseph Conrad"
0 => "Joseph"
name => "Joseph"
0 => "Joseph"
word => "Joseph"
alpha => "J"
alpha => "o"
alpha => "s"
alpha => "e"
alpha => "p"
alpha => "h"
whitespace => " "
surename => "Conrad"
word => "Conrad"
alpha => "C"
alpha => "o"
alpha => "n"
alpha => "r"
alpha => "a"
alpha => "d"
whitespace => " "
title => ""Lord Jim""
word => "Lord"
alpha => "L"
alpha => "o"
alpha => "r"
alpha => "d"
whitespace => " "
word => "Jim"
alpha => "J"
alpha => "i"
alpha => "m"

狠狠撸

Perl 5 & 6 regex complexity

More Related Content

Perl 5 & 6 regex complexity