As I was working on the “Working with Bits” chapter of *Mastering Perl*, I thought about the prime number checker from Abigail. I’ve known about that for years. Abigail came up with it in 1998, and it’s listed in the CPAN JAPH file. I never bothered to look into how it worked since I figured it was something clever:

% perl -E 'say "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/' 1234567

His use of `1`

made me think there might be something interesting to say about bit vectors, but that turned out not to be so. I could use any character (or sequence) in place of the `1`

.

Still, I’ll break it down. The program has this structure:

say "Prime" if STRING !~ REGEX

The `!~`

is a negated match. The condition is true if the string *does not* match the regular expression. If the string matches, then it’s a composite number so the condition is false. It might make you feel better to negate the condition with `unless`

instead:

say "Prime" unless STRING =~ REGEX

Abigail’s pattern looks a bit more complicated than it really is because of the `1`

character being both a literal and a back reference:

^1?$|^(11+?)\1+$

When I see regexes where I don’t immediately see what is going on, I look for some hook in it so I can break it down. In this regex, there’s the alternation, and both sides of the alternation use the beginning of line (`^`

) and end of line anchors (`$`

). I suspect Abigail did that for symmetry and to avoid the parentheses he’d need to solve the precedence problem. Otherwise, it would look like this, spread out a bit:

^ (?: 1? | (11+?) \1+ ) $

Assuming that the anchors match, the inside portion is two branches:

1? | (11+?) \1+

It’s one of these:

1? (11+?) \1+

The first one is easy. It’s zero or one characters, and nothing more. That matches the trivial condition where the input number is `1`

. If it’s not exactly one character, the regex tries the other side of the alternation:

(11+?) \1+

There are two parts to this: the capture and the back reference. The back reference can match one or more times. So, it’s looking for complete groups, and nothing more, of the same thing. The actual thing we match hardly matter. It comes down being able to break up the string evenly into groups.

For example, I take the number nine. The string for that is a sequence of nine *1*‘s:

111111111

Can I break up that string into equal groups with nothing left over? Sure; there are three groups of three:

111 111 111

The number of groups represent one factor, and the length of any one of the groups represents another factor. Since it finds two factors, the number must be composite, so not prime.

I can do it again with seven:

1111111

There isn’t a way to create equally-sized groups. The regex tries the longest string it can for the capture, then checks that it can match the back reference one or more times. It keeps failing:

111111 1 11111 11 1111 111 111 111 1 11 11 11 1

Thus, the pattern doesn’t match `1111111`

, so the number is prime because there’s not another factor.

That’s the basic idea, but the regex engine gets there in many more steps. I can see how it tries the longest first group, fails, and keeps backtracking. I ran this under Regexp::Debugger.

That’s all nice, but there’s nothing about bit vectors there. Oh well.

I’ve seen the regex “^1?$|^(11+?)\1+$” been described as a “prime number” regex from several websites (yours too!), but it isn’t!! It is a COMPOSITE number checker. It returns TRUE if the string is a composite number of 1s long. If you have a “!~” or a -v option in your regex package that’s great. But what if you don’t: IS THERE A REGEX THAT RECOGNIZES PRIME NUMBER DIRECTLY, WITHOUT ‘INVERTING’ THE ANSWER AT THE END??

It’s a bit of semantics, but this recognizes a prime by checking if it’s a composite, just like a sieve would. How else would you recognize a prime? If you come up with a way you’ll probably get the Fields Medal!