New to “Advanced Regular Expressions”

This chapter has lots of new material and much of the old text is gone. I wasn’t looking forward to working on this chapter because I knew it was going to be mostly new text.

My goal of Mastering Perl has always been to present material not already well covered in some other book. This chapter did that when I first wrote it, but I then moved some of it into other books.

The stuff about regular expression references moved into Intermediate Perl, so I assume in Mastering Perl that readers already know that stuff. There went a third of the chapter.

I had a medium-sized section on YAPE::Regex::Explain, a module that turned a pattern into an english description of the pattern. That module hasn’t kept up with post-v5.10 features and has been abandoned. That’s gone.

Curiously, neither Learning Perl nor Intermediate Perl had covered non-capturing parentheses. I fixed that, so that section is gone from Mastering Perl.

But, this left room for much more exciting and advanced things.

Randal Schwartz meditated on a wickedly tight minimal JSON parser that he wrote for a particular client situation. He used advanced features including regex grammars, code execution with (?{...}), and data structure bootstrapping with $^R. I wanted to explain that regex but I needed to talk about all of the features in it. I think I did a pretty good job in the normal camelid trilogy style: I start with a simple program, find an edge case that causes a problem, modify, and repeat. I used the example of matching nested quotes to cover, one feature at a time:

  • (?PARNO) to refer to earlier capture groups as independent patterns
  • Recursion with (?PARNO), which allows matching balanced text
  • (?(DEFINE)...) to create and name sub patterns for use later
  • (?{...}) to watch what’s happening in a regex
  • $^N inside (?{...}) to get the text of the previous capture buffer
  • bootstrapping a data structure with $^R inside (?{...})

Once I cover all of those, I can show off Randal’s regex. Instead of explaining it, though, I leave that to the reader. By the time I get there, they should understand all of the features he uses.

That stuff was a bit of a bear to figure out. The docs aren’t great and there are very few examples out there. I even typed out a long StackOverflow question about it that led me to answering my own question.

Besides that, I pull out \K, another v5.10 feature, to fix my broken money commifying example. I’m not actually the person who fixed it though; I lifted that regex from Michael Carmen’s answer to my StackOverflow question about it.

After all that, I end the chapter with an expanded section on regular expression debugging, including Damian Conway’s Regexp::Debugger. That’s not very exciting in a static book, so I’ll have to make some screencasts about it here.

You can read this chapter in O’Reilly Atlas.

One thought on “New to “Advanced Regular Expressions””

  1. See ppixregexplain.pl
    and wxPPIxregexplain.pl
    and

    #!/usr/bin/perl --
    #!/usr/bin/perl --
    use strict;
    use warnings;
    use autodie qw/ chdir /;
    use Data::Dump qw/ dd pp /;
    use Path::Tiny qw/ path /;
    require 'ppixregexplain.pl';
    
    my $here = path( __FILE__ )->realpath->parent;
    chdir $here;
    my $outfh = path( 'merlyn-jsonparser-995856.html' )->openw_raw;
    select $outfh;
    my $re = path( 'merlyn-jsonparser-995856.pl' )->slurp_raw;
    MainXplain( '--html', $re );
    __END__

Comments are closed.