Creating a Markdown Tooling System
Creating a Markdown Tooling System
In the last newsletter, I shared my experiences in creating a PowerShell tool to process Markdown files retrieved using the Buttondown CLI. These are files representing each newsletter I have published since 2022. Each file has a Markdown metadata header.
---
id: 379f1f33-af6c-4483-967f-c7b0d955fa51
subject: WPF PowerShell Applications
status: sent
email_type: premium
slug: wpf-powershell-applications
publish_date: 2024-03-22T17:16:45.726699Z
---
My goal is to parse this data and create a PowerShell object representation of the metadata. My initial approach to this task was to use regular expressions. Maybe you can apply what I demonstrated last time to your work.
But there is almost always a different way to achieve a goal in PowerShell. Today, I want to share another approach. This is a technique I often use when parsing or working with text files. I'm going to use a generic collection for each file and extract the metadata from there.
Using a Generic Collection
I'll initialize a generic collection designed to hold strings and define the path to a sample file.
$list = [System.Collections.Generic.List[string]]::new()
$Path = 'd:\buttondown\emails\wpf-powershell-applications.md'
It is possible to combine the following steps, but I'll keep them separate for clarity. First, I will read the file's contents.
$content = Get-Content -Path $Path
Pay close attention here. When using regular expressions, I included the -Raw
parameter to read the file as a single string. However, when using a generic collection, I do not use this parameter. The Get-Content
cmdlet reads the file line by line and returns an array of strings. I can then add this array to the generic collection.
$list.AddRange([string[]]$content)
I know the metadata is bounded by ---
lines. If I can find the list index of the opening and closing lines, I can parse the lines in between.
$i = $list.IndexOf('---', 0)
I will use the IndexOf
method to find the index of the first occurrence of the ---
line. It isn't required, but the second parameter value is the index of where I want to begin searching. Since the metadata header is the first thing in the file, $i
will be 0. To find the closing marker, I can use the same method and start searching from the line that follows.
$j = $list.IndexOf('---', $i + 1)
Now I have the start and end indices of the metadata header. I can extract the lines in between.
PS C:\> $list[$i..$j]
---
id: 379f1f33-af6c-4483-967f-c7b0d955fa51
subject: WPF PowerShell Applications
status: sent
email_type: premium
slug: wpf-powershell-applications
publish_date: 2024-03-22T17:16:45.726699Z
---
Or to be more precise:
PS C:\> $list[($i+1)..($j - 1)]
id: 379f1f33-af6c-4483-967f-c7b0d955fa51
subject: WPF PowerShell Applications
status: sent
email_type: premium
slug: wpf-powershell-applications
publish_date: 2024-03-22T17:16:45.726699Z
The IndexOf
method uses a simple comparison. For more complex scenarios, you can use the FindIndex
method, which allows you to specify a predicate to match against the items in the collection. A predicate is like a script block.
$i = $list.FindIndex(0, { $args[0] -eq '---' }) + 1
$j = $list.FindIndex($i, { $args[0] -eq '---' }) - 1
The first parameter is the index number to begin the search. The predicate is the script block.$Args[0]
represents an unnamed parameter that evaluates to the contents of each line in the collection. Functionally, my syntax for FindIndex
is doing the same thing because I am using the -eq
operator to compare the line contents to the ---
string. For more complicated searches, I might use -like
or -match
operators or a compound expression. Think of the predicate conceptually as a Where-Object
filtering script block.
My code also takes the index number into account, so $i
is the first line of the metadata and $j
is the last. I don't have to deal with the ---
markers. With this information, I can process each line and split it into key/value pairs on the colon.
$list[$i..$j] | ForEach-Object -Begin {
# Initialize the hashtable
$meta = @{}
} -Process {
#split into two strings
$split = $_ -split ':', 2
$meta.Add($split[0].Trim(), $split[1].Trim())
} -End {
$meta
}