Dec 3, 2011

Regular Expressions


This is a super-awesome website that parses any regex you throw at it into plain ole English!

Match the boundaries of a word
Exceedingly useful for inserting quotes and commas before and after words
\<([^ ]*)\>

This will find the space bounding the word:
\<: Start of the word
( ): Can contain a list of options each separated by a "pipe" |
[] Restricts the possible values that the pattern will match in a particular position.
[^*]: Inside the bracket it tells the parser to OMIT the characters listed. Here, omit everything.
* any number of the preceding character are allowed but none are required
\>: End of the word

In the replace box:
\1 This is a back reference to the submatch within the 1st parentheses.
"\1" Tell geany to replace the boundaries of the 1st match with quotes.

Positional Characters:
^: Caret matches START pattern.
$: Dollar symbol ENDING pattern.

Wildcards:
. matches any single character
\.: is a decimal point
* any number of the preceding character are allowed but none are required

Character Classes
[] Restricts the possible values that the pattern will match in a particular position.
[A-Z] Supports ranges of characters within the brackets.

Combo Meta Characters and Character Classes
^[A-Z] Matches any word starting with a capital alphabet.
[0-9] or \d Identifies any word with at least one number between 0 and 9 (or digits) in it.

Omission: Double meaning of Caret
[^0-9] or \D: Inside the bracket it tells the parser to OMIT the characters listed. \D is a non-digit.
^[0-9]: Outside the bracket it tells the parser to match the characters listed at the BEGINNING of the pattern.

The Repetition Indicator
Combine the wildcard and repetition indicator. Examples:
^\D*: Starts with a series of non-digits
\D{2, 6}: Two to six non-digits
\d{5}: Five digits

The Optional Indicator
,?: The character (in this case the comma) preceding the ? may occur one time or it may not.
+,?:  Look for at least one character prior to the comma.

A Range of Options
( ): Can contain a list of options each separated by a "pipe" |

Ignore Case
/i: i after the second forward slash ignore case.

Complicated Expressions

("/(int.*l?|wo?r?ld|glo?b[ae]?l?)/i")
/ start parsing
int.*l?: words starting with int, any single character, in fact any number of single characters are allowed but none are required, i.e, it could just be "int", followed by l, OR word starting with wo, o may occur one time or it may not, r may occur one time or it may not, followed by ld, OR gl followed by,o may occur one time or it may not, b, [anything from a through e] may occur one time or they may not, l may occur one time or it  may not.
/i: ignore case, end parsing

Writing complicated expressions
Start with simple building blocks
Create a variable to match misspelled word headache
he?a?d.?[ -]?(ache)


Concatenate regular expressions with
||
Separate them with .* :patterns separated by any kind of text


Standardizing Spelling
s/: indicateds that the regular expression will be used in a substitution
Regular expression parser will search for the pattern between the first two forward slashes and replace it with the text found between the second and the third forward slashes.

0 Comments:

Post a Comment