Behind the PowerShell Pipeline logo

Behind the PowerShell Pipeline

Subscribe
Archives
October 11, 2024

Creating a GitHub Repository Tool - Part 2

I've been sharing my experiences and work process in building a PowerShell tool to manage my GitHub repositories. Instead of fussing with the GitHub API, I've been using the gh command-line tool. One terrific feature is that I can create JSON output with only the field I need. Although, as I showed you last time, I still need to organize the output into a meaningful PowerShell object.

Importing and Exporting

Using the concepts and techniques I've shown you so far, I can meet a lot of my requirements. For example, I don't always need the most up-to-date information on my repositories. Instead of worrying about API rate limits, I can work with offline data. I wrote this PowerShell function to export my GitHub repository information to a JSON file.

Function Export-GitHubRepos {
    [cmdletbinding()]
    Param(
        [int32]$Count = 200,
        [string]$Path = 'C:\scripts\jdhitsolutions-github.json'
    )

    #JSON field names are case-sensitive
    #don't include spaces between field names
    $fields = 'id,name,url,description,diskUsage,pushedAt,updatedAt,latestRelease,stargazerCount,watchers,defaultBranchRef,visibility'
    #do not include my repos that are forks
    gh repo list --no-archived --source -L $count --json $fields |
    Out-File -FilePath $Path -Encoding utf8

    Get-Item -Path $Path
}

A few notes on this function. There's no way to tell GitHub, "Give me everything." If you want more than the default number of results you have to specify a count. Fortunately, when you run gh repo list, the output shows you how many repositories you have. As long as I set a count greater than the total number of repositories, I'll get everything.

After more testing, I also added a parameter to the gh command. I am using --source to only show non-fork repositories. In other words, I don't need to see my repositories that are forks of other repositories. I only want to see the repositories I've created. I also recommend that when exporting data to a JSON file to specify the utf8 encoding. This will ensure that any special characters or emojis are preserved.

I want to pause here and mention a design decision I had to ponder. I have code to create rich PowerShell objects from the GitHub JSON data. Why not export those objects? The challenge, I realized, is on importing the objects back into PowerShell. JSON won't capture the type information. If I get the contents of the JSON file and send them through ConvertFrom-JSON, I'll get a custom object with all of the properties, but it will be a generic PSCustomObject. There are ways I can inject a type name, but that adds a little complexity to the process. I could use Export-Clixml, but then the file is harder to read and tends to be a little larger than JSON.

I decided that exporting the raw JSON data was the best approach. This makes it easier to import the raw JSON as I continue developing commands. I can use this cached data instead of hitting the GitHub API. Of course, I can always import the data and convert it to rich PowerShell objects. Here's my import function.

Function Import-GitHubRepos {
    [CmdletBinding()]
    Param(
        [Parameter(Position=0,HelpMessage="The path to JSON file created with Export-GitHubRepos.")]
        [ValidateScript({Test-Path $_})]
        [string]$Path = 'C:\scripts\jdhitsolutions-github.json'
    )

    Write-Verbose "Importing repository data from $Path"
    $data = Get-Content -Path $Path | ConvertFrom-Json
    Write-Verbose "Importing $($data.Count) repositories"

    foreach ($item in $data) {
        #DateTime values from GitHub are in UTC
        #I am converting them to local time
        Write-Verbose "Processing $($Item.Name)"
        if ($item.latestRelease) {
            $latestPublished = $item.latestRelease.publishedAt.ToLocalTime()
            $latestRelease = $item.latestRelease.name
        }
        else {
            #set both variables to Null
            $latestPublished = $latestRelease = $null
        }
        #this import process does not handle emojis
        [PSCustomObject]@{
            PSTypeName        = 'GitHubRepoInfo'
            Name              = $item.name
            Description       = $item.description
            LastUpdate        = $item.updatedAt.ToLocalTime()
            Visibility        = $item.visibility
            DefaultBranch     = $item.defaultBranchRef.name
            LatestReleaseName = $latestRelease
            LatestReleaseDate = $latestPublished
            LastPush          = $item.pushedAt.ToLocalTime()
            StargazerCount    = $item.stargazerCount
            WatcherCount      = $item.watchers.totalCount
            URL               = $item.url
            DiskUsage         = $item.diskUsage
            ID                = $item.id
        }
   } #foreach item
   Write-Verbose "Import complete"
}
Importing GitHub repo data
figure 1

The import process doesn't have the emoji-related code. I also realized that the datetime values from GitHub are in UTC. I am converting them to local time.

LastPush          = $item.pushedAt.ToLocalTime()

Of course, I can always get the content of the JSON file and use that in my development process.

Defining a Class

I have also reached another design decision. At this point, I realize I have code to define my custom object in at least two places, the Get and Import functions. If I make a change in the object design, I need to change it in multiple files. Instead of defining a custom object with a hashtable, if I define a PowerShell class, I can separate the code to define the object from the commands that need to use the object.

I'll put my class definitions in a separate PowerShell script file. You can probably sense that I will want to eventually create a module, but for now I'll stick to dot-sourcing separate script files. Although, I am writing my code for PowerShell 7 from this point forward.

Using my hashtable as a guide, I can define the class like this.

Class GitHubRepoInfo {
    [string]$Name
    [string]$Description
    [DateTime]$LastUpdate
    [ghRepoVisibility]$Visibility
    [string]$DefaultBranch
    [string]$LatestReleaseName
    [DateTime]$LatestReleaseDate
    [DateTime]$LastPush
    [int32]$StargazerCount
    [int32]$WatcherCount
    [string]$URL
    [int32]$DiskUsage
    [string]$ID

    #this class has no methods
    #overloaded constructor
    GitHubRepoInfo([string]$Name) {
        $this.name = $Name
    }
    GitHubRepoInfo([string]$Name,[string]$url,[ghRepoVisibility]$Visibility) {
        $this.Name = $Name
        $this.Url = $url
        $this.Visibility = $Visibility
    }
}

Because the Visibility property can only be Private or Public, at least for my purposes, i can define them as an enumeration.

Enum ghRepoVisibility {
    Public
    Private
}
Want to read the full issue?
GitHub Bluesky LinkedIn About Jeff
Powered by Buttondown, the easiest way to start and grow your newsletter.