Regular Expression Features
In this issue:
Welcome back to the next lesson in how to use regular expressions in PowerShell. Once you understand the basic capture patterns like \w+, everything else builds on top of that. This is where things begin getting complex and patterns start making your head spin. Let me see if I can ease some of that anxiety.
Ranges
Some character sets, like \w and \d encompass a range of characters. However, you can create patterns that match on a narrower range. Enter the characters you want to match in square brackets.
PS C:\> "PowerShell" -match "PowerSh[ae]ll" ? $matches : $null
Name Value
---- -----
0 PowerShell
PS C:\> "PowerShill" -match "PowerSh[ae]ll" ? $matches : $null
In this example, I can match on either a or e. For alphabetic characters, you can also specify a literal range.
PS C:\> "PowerShell" -match "PowerS[a-h]ell" ? $matches : $null
Name Value
---- -----
0 PowerShell
PS C:\> "PowerSeell" -match "PowerS[a-h]ell" ? $matches : $null
Name Value
---- -----
0 PowerSeell
PS C:\> "PowerSwell" -match "PowerS[a-h]ell" ? $matches : $null
My pattern is looking for any letter between a and h after the S. Remember, in PowerShell, comparisons are case-insensitive by default so [a-h] is the same as [A-H].
Here's another example using numbers.
PS C:\> "Power2026" -match "[0-5]+" ? $matches : $null
Name Value
---- -----
0 202
> I seem to recall in earlier versions of the .NET regular expression library, which is what PowerShell is using under-the-hood, that this example wouldn't work. I'm glad that it does now.
Did you notice the quantifier in the pattern? I'm looking for one or more matches of the preceding pattern, which is any number between 0 and 5. 6 isn't in that range so the match is only on the first three numbers.
Let's have a pop quiz. Can you predict or explain what this pattern is trying to match on?
PS C:\> "Power2026" -match "[0-5]{4}" ? $matches : $null
Do you understand why it failed to match? Let me revise the pattern.
PS C:\> "Power2026" -match "[0-6]{4}" ? $matches : $null
Name Value
---- -----
0 2026
The pattern is looking for exactly 4 matching instances of the previous pattern, i.e. numbers 0 through 6.
Not in the Range
It is also possible to turn this on its head. I'm inserting a caret (^) into the pattern which means, *don't match on the following characters.
PS C:\> "Power2026" -match "Power[^0-6]" ? $matches : $null
PS C:\> "Power7788" -match "Power[^0-6]{4}" ? $matches : $null
Name Value
---- -----
0 Power7788
This is where things can get tricky. I'm using a negative pattern, but it might be easier to use a negative operator with a positive pattern.
PS C:\> "Power7879" -notMatch "Power[0-6]"
True
It ultimately comes down to the context of how you are using the regular expression comparison.
Grouping
Related to ranges is the concept of grouping. This is where you can look for a group of characters in a given order. Think of it as a substring match.
PS C:\> "PowerShell" -match "(ell)" ? $matches : $null
Name Value
---- -----
1 ell
0 ell
PS C:\> "PowerSwell" -match "(ell)" ? $matches : $null
Name Value
---- -----
1 ell
0 ell
PS C:\> "PowerShemp" -match "(ell)" ? $matches : $null
PS C:\>
This is often used with the | character which indicates an alternate search.
``powershell PS C:\> "PowerShell" -match "(p|t)ower((sh)|(sm)|(gr))\w+" ? $matches : $null
Name Value ---- ----- 3 Sh 2 Sh 1 P 0 PowerShell
The patterns are getting a little more complicated. My string needs to match on either a `P` or `T` followed by the string `ower`. After that I want combinations of `sh`, `sm`, or `gr` followed by one or word characters. Notice I'm grouping these in another set of parentheses.
See if you can understand why these examples match or don't.
```powershell
PS C:\> "TowerSmell" -match "(p|t)ower((sh)|(sm)|(gr))\w+" ? $matches : $null
Name Value
---- -----
4 Sm
2 Sm
1 T
0 TowerSmell
PS C:\> "Powergreen" -match "(p|t)ower((sh)|(sm)|(gr))\w+" ? $matches : $null
Name Value
---- -----
5 gr
2 gr
1 P
0 Powergreen
PS C:\> "TowerSwell" -match "(p|t)ower((sh)|(sm)|(gr))\w+" ? $matches : $null
PS C:\>