Creating a Markdown Tooling System

predicate

                June 10, 2025

            Creating a Markdown Tooling System

            Creating a Markdown Tooling System
In the last newsletter, I shared my experiences in creating a PowerShell tool to process Markdown files retrieved using the Buttondown CLI. These are files representing each newsletter I have published since 2022. Each file has a Markdown metadata header.
---
id: 379f1f33-af6c-4483-967f-c7b0d955fa51
subject: WPF PowerShell Applications
status: sent
email_type: premium
slug: wpf-powershell-applications
publish_date: 2024-03-22T17:16:45.726699Z
---

My goal is to parse this data and create a PowerShell object representation of the metadata. My initial approach to this task was to use regular expressions. Maybe you can apply what I demonstrated last time to your work.
But there is almost always a different way to achieve a goal in PowerShell. Today, I want to share another approach. This is a technique I often use when parsing or working with text files. I'm going to use a generic collection for each file and extract the metadata from there.
Using a Generic Collection
I'll initialize a generic collection designed to hold strings and define the path to a sample file.
$list = [System.Collections.Generic.List[string]]::new()
$Path = 'd:\buttondown\emails\wpf-powershell-applications.md'

It is possible to combine the following steps, but I'll keep them separate for clarity. First, I will read the file's contents.
$content = Get-Content -Path $Path

Pay close attention here. When using regular expressions, I included the -Raw parameter to read the file as a single string. However, when using a generic collection, I do not use this parameter. The Get-Content cmdlet reads the file line by line and returns an array of strings. I can then add this array to the generic collection.
$list.AddRange([string[]]$content)

I know the metadata is bounded by --- lines. If I can find the list index of the opening and closing lines, I can parse the lines in between.
$i = $list.IndexOf('---', 0)

I will use the IndexOf method to find the index of the first occurrence of the --- line. It isn't required, but the second parameter value is the index of where I want to begin searching. Since the metadata header is the first thing in the file, $i will be 0. To find the closing marker, I can use the same method and start searching from the line that follows.
$j = $list.IndexOf('---', $i + 1)

Now I have the start and end indices of the metadata header. I can extract the lines in between.
PS C:\> $list[$i..$j]
---
id: 379f1f33-af6c-4483-967f-c7b0d955fa51
subject: WPF PowerShell Applications
status: sent
email_type: premium
slug: wpf-powershell-applications
publish_date: 2024-03-22T17:16:45.726699Z
---

Or to be more precise:
PS C:\> $list[($i+1)..($j - 1)]
id: 379f1f33-af6c-4483-967f-c7b0d955fa51
subject: WPF PowerShell Applications
status: sent
email_type: premium
slug: wpf-powershell-applications
publish_date: 2024-03-22T17:16:45.726699Z

The IndexOf method uses a simple comparison. For more complex scenarios, you can use the FindIndex method, which allows you to specify a predicate to match against the items in the collection. A predicate is like a script block.
$i = $list.FindIndex(0, { $args[0] -eq '---' }) + 1
$j = $list.FindIndex($i, { $args[0] -eq '---' }) - 1

The first parameter is the index number to begin the search. The predicate is the script block.$Args[0] represents an unnamed parameter that evaluates to the contents of each line in the collection. Functionally, my syntax for FindIndex is doing the same thing because I am using the -eq operator to compare the line contents to the --- string. For more complicated searches, I might use -like or -match operators or a compound expression. Think of the predicate conceptually as a Where-Object filtering script block.
My code also takes the index number into account, so $i is the first line of the metadata and $j is the last. I don't have to deal with the --- markers. With this information, I can process each line and split it into key/value pairs on the colon.
$list[$i..$j] | ForEach-Object -Begin {
    # Initialize the hashtable
    $meta = @{}
} -Process {
    #split into two strings
    $split = $_ -split ':', 2
    $meta.Add($split[0].Trim(), $split[1].Trim())
} -End {
    $meta
}

Note that I am using the -split operator with a limit of 2. This ensures that I only get two strings for each split. Otherwise, the -split operator would split on every colon in the line, which would be a problem for the publish_date line. I also use the Trim() method to remove any leading or trailing whitespace from the key and value strings, just in case.
This leaves me with a hashtable that contains the metadata.
Name                           Value
----                           -----
publish_date                   2024-03-22T17:16:45.726699Z
id                             379f1f33-af6c-4483-967f-c7b0d955fa51
email_type                     premium
subject                        WPF PowerShell Applications
slug                           wpf-powershell-applications
status                         sent

Looks pretty easy, right? Remember what I said last time about knowing your data and trusting that it will be consistent? It turns out that is not the case with these files. Here's a metadata header from another file:
id: 68dbe992-4dd1-41bd-81e5-a09a8de76066
subject: Toolmaking Toolmaking
status: sent
email_type: premium
slug: toolmaking-toolmaking
publish_date: 2024-03-07T18:13:13Z
attachments:
  - 31079a9f-58ce-48ed-a7dc-9c66c1cf966e
---

My code will fail on this because it can't split on that last line. Fortunately, I can ignore attachment metadata. However, I will add a filtering check to ensure that I only process lines that contain a colon. This will prevent the code from failing on lines that do not conform to the expected format.
$list = [System.Collections.Generic.List[string]]::new()
$Path = 'd:\buttondown\emails\toolmaking-toolmaking.md'
$content = Get-Content -Path $Path
#add to the list
$list.AddRange([string[]]$content)
$i = $list.FindIndex(0, { $args[0] -eq '---' }) + 1
$j = $list.FindIndex($i, { $args[0] -eq '---' }) - 1
$list[$i..$j] | Where {$_ -match ":"} | ForEach-Object -Begin {
    # Initialize the hashtable
    $meta = @{}
} -Process {
    #split into two strings
    $split = $_ -split ':', 2
    $meta.Add($split[0].Trim(), $split[1].Trim())
} -End {
    $meta
}

This provides the expected result.
Name                           Value
----                           -----
publish_date                   2024-03-07T18:13:13Z
id                             68dbe992-4dd1-41bd-81e5-a09a8de76066
email_type                     premium
subject                        Toolmaking Toolmaking
slug                           toolmaking-toolmaking
status                         sent

With the meta hashtable, I can create my custom object using some of the logic I shared last time.
$web = 'https://buttondown.com/behind-the-powershell-pipeline/archive/'
$link = $web + $meta.slug
$obj = [PSCustomObject]@{
    Title     = $meta.subject
    Published = $meta.publish_date -as [datetime]
    Category  = $meta.email_type
    Link      = $link
    Path      = $path
    Status    = $meta.status
}

figure 1
Let's try the whole directory.
$Path = 'd:\buttondown\emails\*.md'
$list = [System.Collections.Generic.List[string]]::new()
$files = Get-ChildItem $Path
$r = foreach ($file in $files) {
    #write-host $file.fullname
    $content = Get-Content -Path $file.FullName
    #add to the list
    $list.AddRange([string[]]$content)
    $i = $list.FindIndex(0, { $args[0] -eq '---' }) + 1
    $j = $list.FindIndex($i, { $args[0] -match '---' }) - 1
    $list[$i..$j] | Where {$_ -match ":"} |
    ForEach-Object -Begin {
        # Initialize the hashtable
        $meta = @{}
    } -Process {
        #split into two strings
        $split = $_ -split ':', 2
        $meta.Add($split[0].Trim(), $split[1].Trim())
    }
    [PSCustomObject]@{
        Title     = $meta.subject
        Published = $meta.publish_date -as [datetime]
        Category  = $meta.email_type
        Link      = $web + $meta.slug
        Path      = $file.FullName
        Status    = $meta.status
    }
    # Clear the list for the next file
    $list.Clear()
}

This took about a second to complete, which is a bit longer than the regular expression approach, but using the generic collection offers some advantages, even if there is a slight overhead. I get the same results as before.
Summary
What I showed you today gives me another way to generate the information and data I want to work with. My code writes a valuable object to the pipeline. However, when building a toolset, it is essential to consider ways to add value beyond the raw data. How might the user want to use this information? What can you add to make the process as effortless as possible? It is these little things that separate a good tool from a great one. We'll dive into this next time.

            (c) 2022-2025 JDH Information Technology Solutions, Inc. - all rights reserved

Don't miss what's next. Subscribe to Behind the PowerShell Pipeline:

Start the conversation: