(Don’t miss this totally sweet, find-as-you-type regexp tester!)

Jenny (THE hackaddict) asked me today how to quickly reformat an html document she was editing, and the best solution turned out to be using a regular expression find and replace. Regexps can look a little scary at first, but they’re actually pretty easy to learn, super useful for tedious text-formatting you might do everyday, and they can even actually be kind of fun to write once you get the hang of them. (I regard them as brainteasers.)

For a simple example of how you might use a regular expression, suppose someone sent you a list of email addresses like this:

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

</b>

and you wanted to change this list to a comma separate list that you could put in the To: field of your email client like this:

[email protected],[email protected],[email protected],[email protected],[email protected], [email protected],[email protected],[email protected],[email protected],[email protected]

If you have only 10 email addresses, it might be fairly easy just to arrow to the end of each line, hit delete, and type a comma, but if you have 20, 100, or 1000 email addresses, that would simply be a waste of time.

A smarter way to do this would be to do a find and replace on end-of-line characters substituting commas. You could accomplish this with any text-editor that allows regular expression search, such as Textmate.

Open your list in a new document, then open a Find dialogue box. In the Find: field, enter

(m)\n

The \n stands for an end-of-line character, and I put a (m) before it because I knew that all of the email address lines ended with the character m. (Even though this was not absolutely necessary, I did this because there was some other text that I did not want to effect and it also allowed me to demonstrate another facet of regexps.) The parentheses () around the m capture and save whatever they enclose. Since there is only one set of parentheses, they are stored in a variable named $1 that we can use in the replace field. (If there were multiple parentheses, they would be stored in variables $2, $3, etc.)

In the Replace: field, I entered

$1,

This inserts whatever was captured by the first set of parentheses (viz. m) and a comma. Clicking Replace All converts linebreaks to commas for all lines ending with m.

This is a somewhat trivial example that doesn’t do full justice to the power of regular expressions, but if I could recommend one piece of programming knowledge to non-programmers, it would be “learn regular expressions.” I use them constantly to automate tedious tasks and they’ve saved me countless hours.

This page has some great info on how to use regular expressions, and I recently discovered a web-based regular expression tester inspired by the magnificent RX-toolkit in Komodo IDE that is a great place to experiment with regexps. You can enter in some sample text to search, then try writing regular expressions and watch them match as you type in real time–indispensable!