Climbing Higher in the Abstract Syntax Tree
We've started an exploration of using the Abstract Syntax Tree` (AST) to analyze PowerShell code. You can use the AST to dissect PowerShell code to its fundamental elements. You might use this to build troubleshooting or debugging tools. I like to use the AST to build tools to help me manage my code. I showed you an example last time and there will be more before we are through.
In the previous article, we looked in general at the AST object. I'm going to continue with the sample script I showed last time. Let's get the AST again.
$Path = 'c:\scripts\sample-script.ps1'
New-Variable astTokens -Force
New-Variable astErr -Force
$AST = [System.Management.Automation.Language.Parser]::ParseFile($Path, [ref]$astTokens, [ref]$astErr)
Tokens
The astTokens
variable is a collection of PowerShell scripting elements. These are the building blocks that comprise your PowerShell code. My simple sample script can be broken down to 131 tokens.
PS C:\> $astTokens.count
131
There are different token types stored in $astTokens
.
PS C:\> $astTokens | Get-Member | Select-Object TypeName -Unique
TypeName
--------
System.Management.Automation.Language.Token
System.Management.Automation.Language.VariableToken
System.Management.Automation.Language.StringLiteralToken
System.Management.Automation.Language.NumberToken
System.Management.Automation.Language.StringExpandableToken
System.Management.Automation.Language.ParameterToken
You can use Get-TypeMember
from the PSScriptTools module to view the properties of each token type.
PS C:\> Get-TypeMember System.Management.Automation.Language.Token
Type: System.Management.Automation.Language.Token
Name MemberType ResultType IsStatic IsEnum
---- ---------- ---------- -------- ------
GetType Method Type
Extent Property IScriptExtent
HasError Property Boolean
Kind Property TokenKind True
Text Property String
TokenFlags Property TokenFlags True
PS C:\> Get-TypeMember System.Management.Automation.Language.ParameterToken
Type: System.Management.Automation.Language.ParameterToken
Name MemberType ResultType IsStatic IsEnum
---- ---------- ---------- -------- ------
GetType Method Type
Extent Property IScriptExtent
HasError Property Boolean
Kind Property TokenKind True
ParameterName Property String
Text Property String
TokenFlags Property TokenFlags True
UsedColon Property Boolean
We're going to dive into this in a moment, but this is how the token types derived from my code.
PS C:\> $astTokens | Group-Object {$_.GetType().name}
Count Name Group
----- ---- -----
2 NumberToken {2, 3}
3 ParameterToken {-ForegroundColor, -FilterHashtable, -ForegroundColor}
2 StringExpandableToken {"Querying logs on $Computername", "Found $($entries.Count) entries in $log"}
7 StringLiteralToken {'System', 'Application', 'Windows PowerShell', Get-Date…}
98 Token {#requires -version 5.1, …
19 VariableToken {$Computername, $Env:COMPUTERNAME, $Credential, $logs…}
Each token type has different properties so you can't easily display all of $astTokens
in a single command. Although, you could use a grouped hash table.
PS C:\> $h = $astTokens | Group-Object {$_.GetType().name} -AsHashTable
PS C:\> $h.NumberToken | Format-Table
Value Text TokenFlags Kind HasError Extent
----- ---- ---------- ---- -------- ------
2 2 None Number False 2
3 3 None Number False 3
PS C:\> $h.StringLiteralToken
Value : System
Text : 'System'
TokenFlags : ParseModeInvariant
Kind : StringLiteral
HasError : False
Extent : 'System'
Value : Application
Text : 'Application'
TokenFlags : ParseModeInvariant
Kind : StringLiteral
HasError : False
Extent : 'Application'
Value : Windows PowerShell
Text : 'Windows PowerShell'
TokenFlags : ParseModeInvariant
Kind : StringLiteral
HasError : False
Extent : 'Windows PowerShell'
Value : Get-Date
Text : Get-Date
TokenFlags : CommandName
Kind : Generic
HasError : False
Extent : Get-Date
Value : Write-Host
Text : Write-Host
TokenFlags : CommandName
Kind : Generic
HasError : False
Extent : Write-Host
Value : Get-WinEvent
Text : Get-WinEvent
TokenFlags : CommandName
Kind : Generic
HasError : False
Extent : Get-WinEvent
Value : Write-Host
Text : Write-Host
TokenFlags : CommandName
Kind : Generic
HasError : False
Extent : Write-Host
TokenFlags
One way to organize or work with tokens is by using the TokenFlags
property.
PS C:\> $astTokens | Group-Object TokenFlags
Count Name Group
----- ---- -----
30 None {[, $Computername, $Env:COMPUTERNAME, […}
2 Keyword {Param, in}
10 AssignmentOperator {=, =, =, =…}
67 ParseModeInvariant {#requires -version 5.1, …
5 UnaryOperator, ParseMode… {,, ,, ,, ,…}
2 SpecialOperator, Disallo… {., .}
4 CommandName {Get-Date, Write-Host, Get-WinEvent, Write-Host}
6 MemberName {Computername, Date, LogName, Level…}
4 TypeName {string, PSCredential, ordered, PSCustomObject}
1 Keyword, StatementDoesnt… {foreach}
Some tokens can have multiple flags. This property is not an array of strings, but is displayed as a single string.
PS C:\> $astTokens | where tokenflags -contains SpecialOperator
PS C:\> $astTokens | where tokenflags -match SpecialOperator
Text : .
TokenFlags : SpecialOperator, DisallowedInRestrictedMode
Kind : Dot
HasError : False
Extent : .
Text : .
TokenFlags : SpecialOperator, DisallowedInRestrictedMode
Kind : Dot
HasError : False
Extent : .
If you know the flag has a single value, you can use a command like:
PS C:\> $astTokens | where tokenflags -EQ 'commandname' | Format-Table
Value Text TokenFlags Kind HasError Extent
----- ---- ---------- ---- -------- ------
Get-Date Get-Date CommandName Generic False Get-Date
Write-Host Write-Host CommandName Generic False Write-Host
Get-WinEvent Get-WinEvent CommandName Generic False Get-WinEvent
Write-Host Write-Host CommandName Generic False Write-Host
These tokens represent detected command elements in the script. A token exists for each instance found in the code. My script has two lines using Write-Host
so there are two tokens.
The Extent
property provides details about the token's location in the script.
PS C:\> $astTokens | where {$_.tokenflags -EQ 'commandname' -And $_.Text -eq 'Write-Host'} | Select -First 1 -ExpandProperty Extent -outVariable ex
File : c:\scripts\sample-script.ps1
StartScriptPosition : System.Management.Automation.Language.InternalScriptPosition
EndScriptPosition : System.Management.Automation.Language.InternalScriptPosition
StartLineNumber : 33
StartColumnNumber : 1
EndLineNumber : 33
EndColumnNumber : 11
Text : Write-Host
StartOffset : 754
EndOffset : 764
This information is also found in the nested position properties.
PS C:\> $ex.StartScriptPosition
File : c:\scripts\sample-script.ps1
LineNumber : 33
ColumnNumber : 1
Line : Write-Host "Querying logs on $Computername" -ForegroundColor Cyan
Offset : 754
By the way, the line in this example includes a return character.
PS C:\> $ex.StartScriptPosition.line.length
67
PS C:\> $ex.StartScriptPosition.line.trim().length
65
Kind
The other way to filter or organize tokens is by the Kind
property. This property is an enumeration of token types.
PS C:\> $astTokens | Group kind
Count Name Group
----- ---- -----
18 Variable {$Computername, $Env:COMPUTERNAME, $Credential, $logs…}
1 SplattedVariable {@PSBoundParameters}
3 Parameter {-ForegroundColor, -FilterHashtable, -ForegroundColor}
2 Number {2, 3}
12 Identifier {string, PSCredential, ordered, Computername…}
4 Generic {Get-Date, Write-Host, Get-WinEvent, Write-Host}
36 NewLine {…
9 Comment {#requires -version 5.1, #requires -RunAsAdministrator, <#…
1 EndOfInput {<eof>}
3 StringLiteral {'System', 'Application', 'Windows PowerShell'}
2 StringExpandable {"Querying logs on $Computername", "Found $($entries.Count) entries in $log"}
3 LParen {(, (, (}
3 RParen {), ), )}
1 LCurly {{}
3 RCurly {}, }, }}
4 LBracket {[, [, [, [}
4 RBracket {], ], ], ]}
2 AtCurly {@{, @{}
5 Comma {,, ,, ,, ,…}
2 Dot {., .}
10 Equals {=, =, =, =…}
1 Foreach {foreach}
1 In {in}
1 Param {Param}
Again, once you have the token, you can explore the available properties.
PS C:\> $astTokens | where kind -eq 'splattedvariable' | Tee-Object -Variable sv
Name : PSBoundParameters
VariablePath : PSBoundParameters
Text : @PSBoundParameters
TokenFlags : None
Kind : SplattedVariable
HasError : False
Extent : @PSBoundParameters
PS C:\> $sv.VariablePath
UserPath : PSBoundParameters
IsGlobal : False
IsLocal : False
IsPrivate : False
IsScript : False
IsUnqualified : True
IsUnscopedVariable : True
IsVariable : True
IsDriveQualified : False
DriveName :
PS C:\> $sv.Extent
File : c:\scripts\sample-script.ps1
StartScriptPosition : System.Management.Automation.Language.InternalScriptPosition
EndScriptPosition : System.Management.Automation.Language.InternalScriptPosition
StartLineNumber : 42
StartColumnNumber : 27
EndLineNumber : 42
EndColumnNumber : 45
Text : @PSBoundParameters
StartOffset : 1053
EndOffset : 1071
Remember, we're still looking at the same token objects. We're just filtering them based on different properties.
PS C:\> $astTokens | where kind -eq 'Generic' | Format-Table
Value Text TokenFlags Kind HasError Extent
----- ---- ---------- ---- -------- ------
Get-Date Get-Date CommandName Generic False Get-Date
Write-Host Write-Host CommandName Generic False Write-Host
Get-WinEvent Get-WinEvent CommandName Generic False Get-WinEvent
Write-Host Write-Host CommandName Generic False Write-Host
Scripting Options
With this introduction, I can easily identify PowerShell commands used within my script.
PS C:\> $astTokens.where({$_.TokenFlags -eq 'CommandName'}).Text |
ForEach-Object -begin { $cult = Get-Culture } -process {
$cult.TextInfo.ToTitleCase(($_.ToLower()))} | Select-Object -Unique
Get-Date
Write-Host
Get-WinEvent
I'm using the Where()
method for slightly better performance. I have no way of knowing how consistent commands are in my scripts, so I'm converting them all to title case. This lets me get a unique list.
With this snippet, I can write a function to parse my script files and return a list of commands used.
Function Get-PSCommand {
[CmdletBinding()]
Param(
[Parameter(
Position = 0,
Mandatory,
ValueFromPipeline,
HelpMessage = 'The path to the .PS1 script file'
)]
[ValidatePattern('.*\.ps1$')]
[string]$Path
)
Begin {
New-Variable astTokens -Force
New-Variable astErr -Force
}
Process {
Write-Verbose "Parsing $Path"
$AST = [System.Management.Automation.Language.Parser]::ParseFile($Path, [ref]$astTokens, [ref]$astErr)
$cmds = $astTokens.where({$_.TokenFlags -eq 'CommandName'}).Text |
ForEach-Object -begin { $cult = Get-Culture } -process {
$cult.TextInfo.ToTitleCase(($_.ToLower()))} |
Select-Object -Unique
[PSCustomObject]@{
PSTypeName = 'ScriptCommands'
Path = $Path
Commands = $cmds
}
}
End {}
}
This function is a proof-of-concept.
PS C:\> dir c:\scripts\*.ps1 | Get-Random -count 5 | Get-PSCommand -Verbose
VERBOSE: Parsing C:\scripts\Dev-Datatable.ps1
VERBOSE: Parsing C:\scripts\Get-DiskInventory.ps1
VERBOSE: Parsing C:\scripts\Basic-HotfixReport.ps1
VERBOSE: Parsing C:\scripts\Get-InfoDemo.ps1
VERBOSE: Parsing C:\scripts\ModuleReport.ps1
Path Commands
---- --------
C:\scripts\Dev-Datatable.ps1 {New-Object, Get-Date, Foreach-Object, Psedit…}
C:\scripts\Get-DiskInventory.ps1 {Write-Verbose, Get-Wmiobject, Sort-Object, Select-Object}
C:\scripts\Basic-HotfixReport.ps1 {Get-Hotfix, Select-Object}
C:\scripts\Get-InfoDemo.ps1 {Read-Host, Get-Wmiobject}
C:\scripts\ModuleReport.ps1 {Start-Transcript, Write-Verbose, Get-Module, Group-Object…}
This is a more accurate way of finding commands instead of parsing text with regular expressions. The AST can tell when Get-Date
is being used as a command and when it is part of a comment.
Summary
I hope you'll try this out. I will be back next time with more AST-related toolmaking. This is an admittedly advanced topic, so if you have questions, you are not alone. Please don't hesitate to ask in the comments.