Regular Expressions

    The and s/// operators return the number of matches or replacements they made,respectively.You can either use the number directly,or check it for truth.

    Don't use capture variables without checking that the match succeeded.

    The capture variables, $1, etc, are not valid unless the match succeeded, and they're not cleared, either.

    1. # BAD: Not checked, but at least it "works".
    2. my $str = 'Perl 101 rocks.';
    3. $str =~ /(\d+)/;
    4. print "Number: $1"; # Prints "Number: 101";
    5.  
    6. # WORSE: Not checked, and the result is not what you'd expect
    7. $str =~ /(Python|Ruby)/;
    8. print "Language: $1"; # Prints "Language: 101";
    1. # GOOD: Check the results
    2. my $str = 'Perl 101 rocks.';
    3. if ( $str =~ /(\d+)/ ) {
    4. print "Number: $1"; # Prints "Number: 101";
    5. }
    6.  
    7. print "Language: $1"; # Never gets here
    8. }

    XXX m// in list context gives a list of matches

    Common match flags

    • /i - case insensitive match
    • /g - match multiple times
    1. $var = "match match match";
    2.  
    3. while ($var =~ /match/g) { $a++; }
    4. print "$a\n"; # prints 3
    5.  
    6. $a = 0;
    7. $a++ foreach ($var =~ /match/g);
    8. print "$a\n"; # prints 3
    • /m - ^ and change meaning
      • Ordinarily, ^ means "start of string" and $, "end of string"
      • /m makes them mean start and end of line, respectively
    • Use \A and \z for start and end of string regardless of /m
    • is the same as \z except it will ignore a final newline
      • /s - . also matches newline
    1. $str = "one\ntwo\nthree\n";
    2. $str =~ /^(.{8})/s;
    3. print $1; # prints "one\ntwo\n"
    • Sets of capturing parentheses are stored in numeric variables
    • Parenthesis are assigned left to right:
    1. my $str = "abc";
    2. $str =~ /(((a)(b))(c))/;
    3. print "1: $1 2: $2 3: $3 4: $4 5: $5\n";
    4. # prints: 1: abc 2: ab 3: a 4: b 5: c

    Avoid capture with ?:

    • If a parenthesis is followed by ?:, the group will not be captured
    • Useful if you don't want the matches to be saved
    1. my $str = "abc";
    2. $str =~ /(?:a(b)c)/;
    3. print "$1\n"; # prints "b"

    Allow easier reading with the /x switch

    • If you're doing something tricky with a regex, comment it.
    • You can do this with the /x flag.
      This ugly behemoth

    is more readable with whitespace and comments, as allowed by the /x flag.

    1. my ($num) =
    2. $ARGV[0] =~ m/^ \+? # An optional plus sign, to be discarded
    3. ( # Capture...
    4. (?:(?<!\+)-)? # a negative sign, if there's no plus behind it,
    5. (?:\d*.)? # an optional number, followed by a point if a decimal,
    6. \d+ # then any number of numbers.
    7. )$/x;
    • Whitespace and comments are stripped unless escaped.

    Automatically quote your regexes with \Q and \E

    • Automatically escapes regex metacharacters
    • Won't escape dollar signs
    1. my $num = '3.1415';
    2. print "ok 1\n" if $num =~ /\Q3.14\E/;
    3. $num = '3X1415';
    4. print "ok 2\n" if $num =~ /\Q3.14\E/;
    5. print "ok 3\n" if $num =~ /3.14/;

    prints

    1. ok 1
    2. ok 3
    • Allows arbitrary code to replace a string in a regular expression
    • Use and friends if necessary

    Know when to use study

    "This is a very long [… 900 characters skipped…] string that I have here, ending at position 1000"

    Now, if you are matching this against the regex /Icky/, the matcher will try to find the first letter "I" that matches. That may take scanning through the first 900+ characters until you get to it. But what study does is build a table of the 256 possible bytes and where they first appear, so that in this case, the scanner can jump right to that position and start matching.

    Handle multi-line regexes

    Use re => debug

    1. -Mre=debug