More Archive Toolmaking
In the last article, I started down the road of creating tools that I could use to build an archive index for this newsletter. Every previous article can be found at https://buttondown.email/behind-the-powershell-pipeline/archive/]. However, the site lacks a search feature.
In the previous article, I showed how I use the Buttondown API to get a list of all emails. I will continue with that data and figure out how to create a content summary with excerpts. I'm still trying to figure out how I want to present this information to you, but first, I need to figure out what it will look like.
Instead of re-running my code to get email data, I'll import the data from the XML file I created in the last article.
$all | Export-Clixml c:\scripts\behind-api-emails.xml
Saving data using the CliXML cmdlets is a great way to keep data for later use, as it preserves the object structure much better than JSON. Working with serialized data is a handy way to save time, and it can also save on API calls, so you don't have to worry about hitting rate limits.
Here's what I'm working with.
PS C:\> $all[0].PSObject.Properties |
Select-Object Name,
@{Name="Type";Expression = {$_.TypeNameOfValue -Replace "(Deserialized\.)?System\.(Management.Automation\.)?",""}},
Value
Name Type Value
---- ---- -----
id String f12c531b-8007-4c28-a115-91ad69b94362
included_tags Object
excluded_tags Object
creation_date DateTime 2/1/2024 9:11:28 PM
modification_date DateTime 2/1/2024 11:16:36 PM
publish_date DateTime 1/30/2024 6:05:57 PM
attachments Object[] {}
subject String Ask Jeff
canonical_url String
image String https://assets.buttondown.email/bc80e7a3-48…
description String January 2024
source String import
body String Here we are at the end of the month. Lot…
secondary_id Int64 175
email_type String public
slug String ask-jeff
external_url String
status String imported
metadata PSCustomObject @{substack_post_id=141177432}
should_send_teaser Boolean False
is_comments_disabled Boolean False
custom_teaser String
absolute_url String https://buttondown.email/behind-the-powersh…
filters Object[] {}
analytics PSCustomObject @{recipients=0; deliveries=0; opens=0; clic…
I'm using the PSObject
to display the properties. I also wanted to see the object type for each property. In the Select-Object
statement, I'm using a regular expression pattern to strip down the type name to save space in the output display.
$_.TypeNameOfValue -Replace "(Deserialized\.)?System\.(Management.Automation\.)?",""
When you use the -Replace
operator, you can use a regular expression pattern to match and replace text. If you are still learning regular expressions, my pattern is searching for System
followed by a literal period (\.
). It is also searching for Deserialized
followed by a literal period. The ?
is a quantifier that means the preceding pattern is optional. Deserialized.
may or may not exist. Likewise, I'm searching for an optional pattern of System.Management.Automation.
I only figured out this pattern after looking at the TypeNameOfValue
property to see what I needed to remove.
I know I'll need to do something with the body property to create an excerpt. I also expect I'll want these properties.
PS C:\> $all[0..2] | Format-List Subject,Email_Type,Publish_date,absolute_url
subject : Ask Jeff
email_type : public
publish_date : 1/30/2024 6:05:57 PM
absolute_url : https://buttondown.email/behind-the-powershell-pipeline/archive/a
sk-jeff/
subject : Add Some Zip to Your PowerShell
email_type : premium
publish_date : 1/25/2024 6:12:58 PM
absolute_url : https://buttondown.email/behind-the-powershell-pipeline/archive/a
dd-some-zip-to-your-powershell/
subject : Finding Your Way on the System.IO.Path
email_type : premium
publish_date : 1/23/2024 6:07:34 PM
absolute_url : https://buttondown.email/behind-the-powershell-pipeline/archive/f
inding-your-way-on-the-systemiopath/
The properties are from the object returned by the API. They property names are clear, but not what I think of as "good" PowerShell property names. I will want to rename them for my custom output.
PS C:\> $all[0..2] | Select-Object @{Name="Title";Expression = {$_.Subject}},
@{Name="Type";Expression={$_.email_type}},
@{Name="Published";Expression={$_.publish_date}},
@{Name="Link";Expression={$_.absolute_url}} | Format-List
Title : Ask Jeff
Type : public
Published : 1/30/2024 6:05:57 PM
Link : https://buttondown.email/behind-the-powershell-pipeline/archive/ask-
jeff/
Title : Add Some Zip to Your PowerShell
Type : premium
Published : 1/25/2024 6:12:58 PM
Link : https://buttondown.email/behind-the-powershell-pipeline/archive/add-
some-zip-to-your-powershell/
Title : Finding Your Way on the System.IO.Path
Type : premium
Published : 1/23/2024 6:07:34 PM
Link : https://buttondown.email/behind-the-powershell-pipeline/archive/find
ing-your-way-on-the-systemiopath/
I probably don't need the time value of the publish date, but I'll leave it for now. I can always remove it later by formatting the [DateTime] object.
Creating an Excerpt
The next major step is in creating an excerpt. My API key gives me full access to the email object so the Body
property is the complete HTML content of the email.
PS C:\Users\Jeff\OneDrive\behind\2024\behind-archive> $all[0].body[0..200] -join ""
Here we are at the end of the month. Lots of content is planned for the year, and big changes are afoot. More on that later. Here are some PowerShell odds and ends.
Extending PSResources
The Body
property is a single string, so I'm getting the first 200 characters and then re-joining them as a string. Ideally, I'd like to get the first 100 words or so. However, I don't want HTML tags to count as words.
I can use another regular expression to find words in the text.
[regex]$rx= "\w+(\.)?"
This is close.
PS C:\> ($rx.matches($all[0].body) | Select-Object -ExpandProperty Value -first 50 ) -join " "
p Here we are at the end of the month. Lots of content is planned for the year and big changes are afoot. More on that later. Here are some PowerShell odds and ends. p h2 Extending PSResources h2 p I hope you have updated your PowerShell package management to
This gets me the first 50 words, but some of those words include HTML tags. I'm also assuming I won't have code samples in the first 100 words. I can also see other challenges ahead, but I'll handle those later.
My initial idea is to create a markdown document that I can use to build a static web page. If I want HTML, it is easy to convert a markdown document to HTML. So, instead of immediately stripping off all HTML tags, I'll convert some of them, like headings, to markdown. Let's test with a single object.
$body = $all[0].body
I want to convert the headings and insert line breaks for clean markdown formatting.
$body = $body -replace ""
,"`n`n# " -replace ""
,"`n`n## " -replace ""
,"`n`n### "
$body = $body -replace "","`n`n"
Here's what I have so far.
PS C:\> $body[0..250] -join ""
Here we are at the end of the month. Lots of content is planned for the year, and big changes are afoot. More on that later. Here are some PowerShell odds and ends.
## Extending PSResources
I hope you have updated your PowerShell package m
Note that I am only showing the first 250 characters, not words. I can get words by splitting the string on spaces.
PS C:\> (($body[0..250] -join "").split(" ") | Select -First 10) -join " "
Here we are at the end of the month. Lots
I can verify this with Measure-Object
.
PS C:\> (($body[0..250] -join "").split(" ") | Select -First 10) -join " " | Measure-Object -word
Lines Words Characters Property
----- ----- ---------- --------
10
But I still have HTML tags in the excerpt. I can remove them with another regular expression.
[regex]$rx = "<[^>]+>"
$body = $rx.Replace($body,"")
I need to replace the heading tags before I use this pattern. But now I am getting closer.
PS C:\> (($body[0..250] -join "").split(" ") | Select -First 100) -join " "
Here we are at the end of the month. Lots of content is planned for the year, and big changes are afoot. More on that later. Here are some PowerShell odds and ends.
## Extending PSResources
I hope you have updated your PowerShell package management
When I apply this to more of the email data, I can see other things I need to re-format. This is why I prefer working with objects. Parsing text is a lot of work, but in this case, it is necessary.
I am not a fan of curly quotes, so I'll replace them with straight quotes using a regular expression.
$body = $body -replace "’","'"
Some of my emails start with a markdown quote block. The content of these blocks tends to unrelated to the main content of the email. I'll remove them.
$body = $body -replace "(?<=\s*)\>(\s+)?.*(\r)?\n", ''
I need to replace HTML replacements for >
and <
characters so that command prompts are rendered correctly.
$body = $body -replace ">",">"
Let's put all of this together.
$z = $all | Select-Object @{Name = 'Title'; Expression = { $_.Subject } },
@{Name = 'Type'; Expression = { $_.email_type } },
@{Name = 'Published'; Expression = { $_.publish_date } },
@{Name = 'Preview'; Expression = {
#replace H tags with markdown
$body = $_.body -replace ''
, "`n`n# " -replace ''
, "`n`n## " -replace ''
, "`n`n### "
$body = $body -replace '', "`n`n"
$body = $body -replace "’","'"
$body = $body -replace ">",">"
[regex]$rx = '<[^>]+>'
#define a new variable
$preview = $rx.replace($body, '')
#strip out markdown quote blocks
$preview = $preview -replace "(?<=\s*)\>(\s+)?.*(\r)?\n", ''
#get the first 50 words
$preview = ($preview -split " " | Select-Object -First 100 ) -join ' '
#append and ellipsis to indicate more
"$preview`n..."
}
},
@{Name = 'Link'; Expression = { $_.absolute_url } } |
Sort-Object Published -Descending

There are still potential formatting issues, but maybe I can clean those up later.
I should also consider creating a function to handle all for the formatting and regex replacements. That would simplify my code. If I want to add more parsing and reformatting, I can do that in the function.
Function parsePreview {
[cmdletbinding()]
Param([string]$Body,[int]$WordCount = 100)
[regex]$rx = '<[^>]+>'
#replace H tags with markdown
$body = $body -replace ''
, "`n`n# " -replace ''
, "`n`n## " -replace ''
, "`n`n### "
$body = $body -replace '', "`n`n"
$body = $body -replace "’","'"
$body = $body -replace ">",">"
#define a new variable
$preview = $rx.replace($body, '')
#strip out markdown quote blocks
$preview = $preview -replace "(?<=\s*)\>(\s+)?.*(\r)?\n", ''
#get the first 50 words
$preview = ($preview -split " " | Select-Object -First $WordCount ) -join ' '
#append and ellipsis to indicate more
"$preview`n..."
}
$all | Select-Object @{Name = 'Title'; Expression = { $_.Subject } },
@{Name = 'Type'; Expression = { $_.email_type } },
@{Name = 'Published'; Expression = { $_.publish_date } },
@{Name = 'Preview'; Expression = { parsePreview -Body $_.body } },
@{Name = 'Link'; Expression = { $_.absolute_url } }
I never expect the function to be run outside of the script, so I can use a non-standard name. adding parameter validation. I'm just going to use it in my script.
Creating a Summary Document
Let's wrap up today by meeting my interim goal of creating a markdown document.I've saved all the parsed data, sorted by publication date to a variable.
$e = $all | Select-Object @{Name = 'Title'; Expression = { $_.Subject } },
@{Name = 'Type'; Expression = { $_.email_type } },
@{Name = 'Published'; Expression = { $_.publish_date } },
@{Name = 'Preview'; Expression = { parsePreview -Body $_.body } },@{Name = 'Link'; Expression = { $_.absolute_url } } |
Sort-Object Published -Descending
Now, I want to create a markdown document. The preview text already contains markdown elements, so I'll just need to add markdown elements around that. Since I will eventually create a file, I'll use a generic list to store the markdown content.
$doc = [System.Collections.Generic.List[string]]::new()
$doc.Add("# Behind the PowerShell Pipeline Archive`n")
The first line is the title of the document. I'll add a line break to separate the title from the content. Next, I need to iterate through the collection of email summaries and add them to the document.
foreach ($item in $e) {
I'll add the title and link to the document.
$doc.Add("## [$($item.Title)]($($item.Link))")
I need to show when the article was published and I'd like to indicate if it is public or premium content. I could display the ItemType
property, but maybe an emoji would be more fun.
if ($item.Type -eq 'premium') {
$emoji = ':heavy_dollar_sign:'
}
else {
$emoji = ':globe_with_meridians:'
}
$doc.Add("***Published***: $($item.Published) UTC $emoji")
If the preview contains headings, like ##
I should make it ###
to maintain proper markdown structure. I'll use another regular expression to do this.
$short = $item.Preview -replace "#{1,2}(?=\s)", '###'
$doc.Add($short.Trim())
The pattern is searching for one or two #
characters followed by a space. The ?=
is a positive lookahead assertion. It is a zero-width assertion that matches a group of characters that are followed by a specific pattern. In this case, the pattern is a space. The ###
is the replacement pattern. The Trim
method removes any leading or trailing spaces.
Here's the complete code to create the markdown document.
$e = $all | Select-Object @{Name = 'Title'; Expression = { $_.Subject } },
@{Name = 'Type'; Expression = { $_.email_type } },
@{Name = 'Published'; Expression = { $_.publish_date } },
@{Name = 'Preview'; Expression = { parsePreview -Body $_.body } },@{Name = 'Link'; Expression = { $_.absolute_url } } |
Sort-Object Published -Descending
$doc = [System.Collections.Generic.List[string]]::new()
$doc.Add("# Behind the PowerShell Pipeline Archive`n")
foreach ($item in $e) {
if ($item.Type -eq 'premium') {
$emoji = ':heavy_dollar_sign:'
}
else {
$emoji = ':globe_with_meridians:'
}
#reduce any headings in the title
$doc.Add("## [$($item.Title)]($($item.Link))")
#insert blank line
$doc.Add('')
$doc.Add("***Published***: $($item.Published) UTC $emoji")
$doc.Add('')
$short = $item.Preview -replace "#{1,2}(?=\s)", '###'
$doc.Add($short.Trim())
$doc.Add('')
}
$doc | Out-File c:\work\behind-test.md
The document looks promising at first.

But then things get wonky.

In looking at the raw markdown document I can see that I am missing closing code fences for the code blocks. And I bet once I fix that, I'll find a few other things I need to address. But not today.
Summary
I hope you are seeing that building PowerShell tooling is an iterative process. Don't expect to write the final version of your code the first time. Work in small chunks and test often. Be prepared to find new issues as you address other issues. Even though I've spent a lot of work parsing text, ultimately I am trying to build objects that I can use. Right now, I am using that object to create a markdown document. But maybe I'll store it in a database, or an XML file.
I'll come back to this project in the near future. Please feel free to leave comments and questions.