multithreading commands

In this age of many companies adopting the ideas of either DevOps or SRE (Site Reliability Engineer) roles, there is constant need for automation. Mundane tasks like collecting metrics, alerts, server and database asset info into a central database on which you can do reporting is now a common task that many DBAs or system admins are performing.

In SQL Server world, we are fortunate to have a vibrant community of PowerShell enthusiasts that have open sourced dbatools – a community driven PowerShell module to automate database development and administration.

This begs a question..

Why do we need automation?

We need automation to get rid of repetitive tasks that we have to perform on a daily basis. It is often said:

If you have to repeat it, better automate it.

This makes sense as the room for common errors is alleviated when the same steps you are taking are automated. Also, with ever increasing data needs, the infrastructure footprint is ever increasing. To meet this ever increasing demand, the only way for a team to scale is to adopt automation.

For example, being a SQL Server DBA, you might have been involved in server migration projects wherein you are tasked to migrate from older version of SQL server (SQL server 2008, SQL server 2008R2) to a more modern & supported version (SQL server 2014, 2016, 2017, etc.).

In the stone age (before dbatools), we as DBA’s spent countless hours scripting pre-migration tasks like logins, SQL Agent jobs, linked servers, system database custom objects, resource governor settings, certificates, etc. Even doing all this effort, often we miss few critical things due to non-standard process. dbatools takes care of the entire migration including pre-migration steps and migrating all or few databases from one instance to another along with error handling and logging. This is awesome, isn’t it?

Once good thing about dbatools is that it has lots of commands to get all kinds of information that you want to gather. The commands name have the prefix Get-Dba*. Another beauty is the ability to pipe the output of the commands and write to SQL tables using Write-DbaDataTable command.

In a large organization where you have hundreds of servers to manage and you often face a situation where you want to query all your SQL server estate. For example, checking total RAM and the max memory setting or you want to get version and service pack info for your entire SQL Server estate. Doing these tasks single threaded against a ton of servers is often painful because it takes lot of time!

Multithreading to the rescue

What is multithreading and how does PowerShell offer multithreading?

In simple terms, multithreading is the ability to run multiple threads on same or separate processors at the same time resulting in parallel execution.

PowerShell provides many ways to allow multi-threading in your scripts. Let’s explore the most common ways:

  • Background jobs: A powerful way of allowing commands to run in the background and then retrieve the results later thereby freeing up the terminal and allowing the users to continue their work without interruption. Whenever a new job is started using Start-Job, it ends up spawning a new powershell.exe process which you can easily see using windows task manager or just running Get-Process command. This means that the more background jobs you run, the more powershell.exe processes are created which is a resource hog! Also, there is no ability to throttle the jobs (i.e. all jobs are started and executed at the same time). There is limited ability to manage multiple threads and have them share variables as well.
  • PowerShell Runspaces: A powerful and flexible way of leveraging .NET’s system.management.automation.runspaces namespace that exposes variety of classes designed to create, manipulate and orchestrate a pool of PowerShell processes. In simple terms, think of Runspaces as containers where everything is contained or stored and ran in an asynchronous fashion.

How to efficiently leverage PowerShell Runspaces with high performance and less overhead ?

Boe Prox has open sourced his awesome PoshRSJob module that provides an alternative to native PowerShell jobs with greater performance and less overhead to run commands in the background, freeing up the console and allowing throttling on the jobs.

We can use this module to leverage multi-threading in dbatools as well. The PoshRSJob module is well-documented, so in this blog post, I will just get into the details of how to leverage PoshRSJob module with dbatools.

As of writing this blog post, there are total of 514 functions in 0.9.722 version of dbatools out of which 192 are for fetching data (Get-Dba*) from your SQL server instances. How did I get those? Easy enough:

# How many functions are there in dbatools?
Get-Command -Module dbatools -Type Function | Measure-Object

# How many functions are there in dbatools that fetch data from SQL Server instances?
Get-Command -Module dbatools -Verb Get | Where CommandType -eq Function | Measure-Object

Now let’s say you are tasked to get all the databases that are part of a given Availability Group. There are two ways to approach this: the traditional (sequential) way and multithreading.

The traditional sequential way

# Get the list of servers. You can read directly from a text file or you can connect to your central database to get the list using tsql.
$servers = Get-Content D:\DBA\serverList.txt

# store credential in a variable which will be used for SQL authentication to your sql servers
$cred = Get-Credential -UserName sqladmin -Message "Enter your sqladmin Password"

# use foreach to loop through the collection and get the desired data
foreach ($server in $servers) {
    # here you can use any dbatools commands (Get-Dba*)
    Get-DbaAgDatabase -SqlInstance $server -SqlCredential $cred
}

Using PoshRSJob module to leverage Runspaces & Multithreading

# Get the list of servers. You can read directly from a text file or you can connect to your central database to get the list using tsql.
$servers = Get-Content D:\DBA\serverList.txt

# store credential in a variable which will be used for SQL authentication to your SQL Servers
$cred = Get-Credential -UserName sqladmin -Message "Enter your sqladmin Password"

# PoshRSJob has throttle parameter. I usually keep it to the number of processors installed on the server
$throttle = $env:NUMBER_OF_PROCESSORS

# Multithread! See below for a breakdown
Start-RSJob -InputObject $servers -Throttle $throttle -ModulesToImport dbatools -ScriptBlock {
    Param($server)
    Get-DbaAgDatabase -SqlInstance $server -SqlCredential $Using:cred
}

# Get-RSJob will display all jobs that are currently available to include completed and currently running jobs.
# Receive-RSJob Gets the results of the Windows PowerShell runspace jobs in the current session. Also you can use -OutVariable variableName e.g. -OutVariable Results and then do $Results to get all the output
Get-RSJob | Where-Object  {{State -like "Completed"} -and  {HasMoreData -like "False"} -and {HasErrors -like "False"} }  | Receive-RSJob

# cleanup only the jobs that are completed, does not have more data and no errors
Get-RSJob | Where-Object  {{State -like "Completed"} -and  {HasMoreData -like "False"} -and {HasErrors -like "False"} } | Remove-RSJob

Breakdown

-InputObject $servers

Here we are passing the collection item – $servers as an input object that will take that object and use it as the first parameter in the script block as long as you add a Param() block in it.

Param($server)

This is added so that the -InputObject collection can be used as the first parameter

-ModulesToImport dbatools

Here you can use any dbatools commands as all the commands are imported as part of the module import

-SqlCredential $Using:cred

This is pretty cool way of passing a local variable to the script block i.e. you can use variables from parent PS Process into PSJob’s scriptblock

On a side note, if you want to get the results or output of Start-RSJob into a variable,  you can do it using $results = Get-RSJob | Receive-RSJob or Get-RSJob | Receive-RSJob -OutVariable Results (now you can use $Results same like former example) and then you can work with the result set as usual, and even write the entire result set to a database using Write-DbaDataTable.

Results

Below test shows the performance difference between Sequential Execution vs Multithreaded execution using PoshRSJob:

  • Total Servers in the serverList.txt: 134
  • Total No. of logical CPUs: 16
dbatools command Sequential Execution PoshRSJob PercentGain
Get-DbaAgDatabase TotalSeconds: 123.13 TotalSeconds: 30.75 300.42 %

Note: When running using Start-Job, the CPU is pegged to 100% (since it spawns individual powershell.exe processes and does not have native throttling option) and it uses almost all the available RAM than the PoshRSJob. The execution time for Start-Job is more than double and exhaust CPU and available RAM on the machine.

References:

Thanks for reading !
~ Kin Shah

Get in touch @TheRockStarDBA or at dba.se

7 thoughts on “multithreading commands

  1. Seth Reply

    Hi, Kin,

    Thank you for showing this to us. I have seen improvements already!

    I’m wondering how you got the timed test results of the sequential executions versus the PoshRSJobs executions?

    Thanks!

  2. Kin Post authorReply

    Hi Seth,

    Glad you liked it. For measuring the execution, you can use

    `Measure-Command {
    ## your script here !
    }`

    • Seth Reply

      Kin,

      Thank you for the tip. I might not be using it correctly, however. For instance:

      Measure-Command {
      Start-RSJob … {
      My script here…
      }
      }

      This just gives me how long it takes to get started, or rather, how long it takes to just execute the Start-RSJob command. After this command is complete, there are still jobs that have not yet finished, and I would like to know how long it takes all the started jobs to complete. Putting Measure-Command inside Start-RSJob just results in how long it takes each parallel command to complete. While I could sum this information, the problem is that I then don’t see the output of the script I originally created.

      Hope that makes sense!

      Seth

  3. Harshan Reply

    Kin,
    Really good article! While I don’t get deep into the technical details, this is highly valuable information to lot of the tech folks. I’ll be sure to share this with some of the tech community I know. Keep sharing and great to see your continued passion in what you do!

    Harshan

  4. Kin Reply

    @Divya – sorry for replying late.

    You can have a text file containing a list of servers or you can just query a central database and generate a list of servers that you want to query using multithreading.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.