Chapters – Page 4 – Mastering Perl

New To “Jury Rigging and Modifying Modules”

Chapter 10 is another easy chapter to update. The language hasn’t changed enough to affect how we deal with Perl’s object system (although Perl might not always be that stable). I did bring in a few paragraphs about Git, pull requests, and Git::CPAN::Patch. Check out this chapter in O’Reilly Atlas.

New to “Configuration”

I didn’t have to make many changes to Chapter 11: Configuration. This is mostly a stable part of the Perl ecosystem. I updated a few minor things.

Readonly is out, Const::Fast is in. I only talk about constants for a short section where I write about a particular way (anti-pattern even) to configure programs. The Readonly module hasn’t been updated in years. It uses tie to perform its magic and has several interesting edge cases from that as you can see in its RT queue. Leon Timmermans replaced it with Const::Fast. It uses almost the same interface but “by doing everything the opposite way Readonly does it.” I’ve asked Eric Roode, Readonly‘s maintainer, if I can get it into Github and have the community update it. I’ll see how that goes.

In two places I had mentioned Google Code Search, the now dead project that was part of Google’s mission to “organize the world’s information and make it universally accessible and useful”. Instead, I found Ohloh’s code search. In the chapter, I look for “DON’T EDIT BELOW THIS LINE” and config.pl to show how common those are.

AppConfig is out. I don’t think it was widely used even when it was fresh, but it’s as old as the first edition of Mastering Perl. Instead, I mentioned JSON and YAML, although I’ll cover those in Chapter 14: Lightweight Persistence. I should survey people to see how they are configuring things nowadays.

Do you have anything else I should include or update? See what I have so far by reading it through Atlas.

ReturnValue on PrePAN

As part of the “Error Handling and Reporting” chapter, I’ve developing my use of normal return values to indicate failure. Instead of tricky value checking, I can use the object type to decide what happened and then look in the object to get the value. That way, success and error values can use the same code path. I’ve created ReturnValue on Github.

Before uploading to CPAN, however, I’m trying PrePAN for the first time. This allows me to show off ReturnValue to the Perl community without claiming the namespace. I don’t deal with PAUSE at all. If I decide to abandon it, it’s no big deal. Comment on ReturnValue at PrePAN.

You can also read the draft chapter on O’Reilly Atlas.

New to “Error Handling”

There’s much to update in Chapter 12, “Error Handling”. I thought this would be an easy chapter.

Since v5.10, Fatal is different and now called autodie. I mostly had to change the module names. The biggest change was removing the non-void non-handling of builtins that Fatal could do. If Fatal saw I checked the return value of a builtin myself, it doesn’t throw the exception for me. I excised that paragraph.

In v5.14 the behavior of $@ changes quite a bit. An eval inside a destructor won’t mess up $@ as the scope is cleaning up.

I’m covering Try::Tiny and TryCatch. I’m still working on those sections.

Do you have anything else I should include or update? See what I have so far by reading it through Atlas.

New to “Logging”

I’ve expanded the Logging chapter to discuss more of the Log4perl features, including the Nested diagnostic context and the Mapped Diagnostic Context, both of which allow me to keep track of information that I can interpolate into log message.

I expanded the discussion of categories a bit more, but not that much. I think it’s a pretty simple feature.

See if you like it by reading it through Atlas.

New to “Cleaning Up Perl”

There’s not much that I needed to update in the chapter devoted to Perl style. The Perl::Tidy stuff is the same and I updated the Perl::Critic with a new program to analyze.

When I originally wrote this chapter, use.Perl was still going and I used my journal reading program as the violation program. Since that site is no longer active, I switched to my retweeter roulette program that I use to select winners from my twitter giveaways.

There are some errors there, but my style has evolved in the seven years since then so I had to work a little harder to get violations.

I also added a bit more on turning off policies, especially for single lines. I left it out last time as a way to discourage it through ignorance, but I’ve changed my mind about that. Using the ## no critic trick is so annoying that it’s easier to comply.

Have I left anything out that you like to do? You can read Chapter 7 through the O’Reilly Atlas pre-publication review program.

New to “Advanced Regular Expressions”

You can read the draft of Chapter 2 in O’Reilly Atlas.

There’s much that I can change in this chapter:

Set operations in regular expressions
The \K
The performance penalty of $& and that Devel::NYTProf finding the problem areas.

New in “Secure Programming Techniques”

This chapter contains most of the original text, although with a few tweaks. There are two big additions which I did not cover in the first edition of this book.

I added a section on security with the DBI module and SQL injection. I don’t really think it belongs in this book any more than any other sort of problem with a CPAN module, but enough people complained that I relented.

And, I added a brief introduction to the Safe module. This is a rarely used security feature that you might find useful if you have to use string eval.

I’ve added some of the sample programs to the downloads page.

You can read the draft chapter now.

New in “Working with Bits”

You can read the draft of Chapter 16 in O’Reilly Atlas.

Bits and bit vectors in Perl haven’t changed since the first edition, so there’s not much to update in this chapter. I thought that Abigail’s prime number regex might deserve some space, but it turns out that it didn’t.

I also thought that the octal prefix 0o had made it in since it’s proposal back in the v5.15 days. It had some interesting parsing problems, and eventually the proposal was dropped.

When I last worked on this book, I was running a 32-bit perl with v5.8. Now I have a 64-bit v5.18. The output of some of the Devel::Peek examples changed a little, so I updated those.

I added an example of using a bit vector to cache the positions of prime numbers. I can create a big string where each bit represents one number. When it’s time to check if a number is prime, I simply check the right bit.

Finally, at the end of the chapter, I updated the URLs for “Further Reading”. In eight years the URLs have moved around a bit.

Abigail’s prime number checker regex

As I was working on the “Working with Bits” chapter of Mastering Perl, I thought about the prime number checker from Abigail. I’ve known about that for years. Abigail came up with it in 1998, and it’s listed in the CPAN JAPH file. I never bothered to look into how it worked since I figured it was something clever:

% perl -E 'say "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/' 1234567

His use of 1 made me think there might be something interesting to say about bit vectors, but that turned out not to be so. I could use any character (or sequence) in place of the 1.

Still, I’ll break it down. The program has this structure:

say "Prime" if STRING !~ REGEX

The !~ is a negated match. The condition is true if the string does not match the regular expression. If the string matches, then it’s a composite number so the condition is false. It might make you feel better to negate the condition with unless instead:

say "Prime" unless STRING =~ REGEX

Abigail’s pattern looks a bit more complicated than it really is because of the 1 character being both a literal and a back reference:

^1?$|^(11+?)\1+$

When I see regexes where I don’t immediately see what is going on, I look for some hook in it so I can break it down. In this regex, there’s the alternation, and both sides of the alternation use the beginning of line (^) and end of line anchors ($). I suspect Abigail did that for symmetry and to avoid the parentheses he’d need to solve the precedence problem. Otherwise, it would look like this, spread out a bit:

^ (?:  1? | (11+?) \1+ ) $

Assuming that the anchors match, the inside portion is two branches:

1? | (11+?) \1+

It’s one of these:

1?
(11+?) \1+

The first one is easy. It’s zero or one characters, and nothing more. That matches the trivial condition where the input number is 1. If it’s not exactly one character, the regex tries the other side of the alternation:

(11+?) \1+

There are two parts to this: the capture and the back reference. The back reference can match one or more times. So, it’s looking for complete groups, and nothing more, of the same thing. The actual thing we match hardly matter. It comes down being able to break up the string evenly into groups.

For example, I take the number nine. The string for that is a sequence of nine 1‘s:

111111111

Can I break up that string into equal groups with nothing left over? Sure; there are three groups of three:

111 111 111

The number of groups represent one factor, and the length of any one of the groups represents another factor. Since it finds two factors, the number must be composite, so not prime.

I can do it again with seven:

There isn’t a way to create equally-sized groups. The regex tries the longest string it can for the capture, then checks that it can match the back reference one or more times. It keeps failing:

Thus, the pattern doesn’t match 1111111, so the number is prime because there’s not another factor.

That’s the basic idea, but the regex engine gets there in many more steps. I can see how it tries the longest first group, fails, and keeps backtracking. I ran this under Regexp::Debugger.

Perl prime number regex under Regexp::Debugger from brian d foy on Vimeo.

That’s all nice, but there’s nothing about bit vectors there. Oh well.