I’ve been using a regular expression (PHP’s preg_match function) to parse email addresses. The addresses have a consistent pattern that look like:
The XXXXXX section is a random set of characters: a-z, A-Z, 0-9 and -. The regular expression’s job was to extract that random set of characters, which is easy to do:
Since the format of the email address is consistent it was easy to pull out the section I was interested in. Well, it was easy until it broke when the email was sent in from a specific email client. Turns out this client set the value of the email address to something different:
After getting over my frustration at this email client I looked at what the regular expression matched:
Well that was no good. Instead of just matching the XXXXXX it was slurping up other portions of the email address as well. Enter the greed factor of Perl Compatible Regular Expressions (PCRE). If you aren’t familiar with greedy regular expressions it simply means that they try to match as much as the can. Fortunately there is a way to turn off the greediness using the U pattern modifier. By adding a U to the end of the regular expression things got much better:
Now even the odd ball email address was extracting just the XXXXXX in the regular expression.
Using the U at the end turns off greediness for the entire regular expression. You can turn off the greediness of a single quantifier (the * in this case) by following it with a ?. Using that technique the regular expression is
anamehere\+(.*?)@.*$ which works as well.
If you find that your regular expressions are matching more that you want remember that they are greedy by default.