I've been intending for a couple of months on how to shut down an Azure Stack integrated system 'the right way'. Why? Because I had to turn off an instance a couple of months ago due to the location hosting the appliance having planned utility maintenance (it hosts pilot/demo kit only so no need for generators), and didn't want any issues with tenant workloads or S2D. Anyway, I don't particularly need to detail the process now as Microsoft have recently updated their documentation detailing the process (get it here).
So, why did I feel the need to make this post?
Primarily, it's to highlight the importance of regularly checking back on the Azure Stack doc pages; they are constantly adding and updating the guidance, especially when new code updates are released. The link provided pertains to version 1712 and above. The Test-AzureStack PowerShell CmdLet has been added to this version, allowing the operator to confirm that all required Azure Stack roles and services are functioning.
Secondly, I wanted to show give some more detail on the steps, output you will see and time it can take for each stage.
Here's the high level steps of what happens when you stop Azure Stack:
Connect to the Privileged Endpoint via PowerShell Remoting
$Pep = 'azs-ercs01' $cred = Get-Credential -UserName 'azurestack\cloudadmin' -Message 'Enter CloudAdmin Password' enter-pssession -computer $Pep -ConfigurationName PrivilegedEndpoint -Credential $cred
Change the $Pep variable to match the name or IP address of one of the ERCS VMs. Change the domain name to match that defined when the integrated system was deployed or leave as-is for ASDK deployments.
Start the shutdown procedure
The following tasks are carried out when you run the command:
The tenant VM's are shutdown (actually saved if you were to check Hyper-V Manager)
This includes servers required for PaaS
ADFS and WAS (portals) are shutdown
Fabric Ring services are shutdown (Resource Providers)
Azure Consistent Storage VMs are shutdown
Azure core infra SQL servers are shutdown
Gateway VMs are shutdown
Software Load Balancer VM's are shutdown
Border Gateway Protocol VM is shutdown
Certificate authority VMs are shutdown
Network Controller VMs are shutdown
Finally, the physical nodes are shutdown
The time it takes to complete is dependent on the number of tenant workloads and PaaS servers you have running
If you close down the session that you ran the command from, you can still check on the progress by connecting to the PEP again and running the following command:
For some reason, the logging is not as 'verbose' as I would like.
Rest assured, things are happening, although it's not clear! I did try running the command again once it had completed and I did see the correct verbose messages. Not sure what happened first time round:
Here's the process
Wait for DCs to start
Wait for storage to be ready (S2D cluster)
Start Network Contoller VMs
Start Certificate Authority VMs
Wait for Certificate Authority Service
Validate Certificate Authority.
Start BGP VMs
Start SLB VMs
Start Gateway VMs
Start GW service
Start SQL VMs
Start SQL Cluster
Azure Consistent Storage VMs are started
Fabric Ring services are started (Resource Providers)
ADFS and WAS (portals) are started
Wait for WAS (admin) portal start-up
Wait for WAS (Public) portal start-up
Finally, the tenant VM's are resumed
If you want to see what the progress is, use the Get-ActionStatus Cmdlet:
As with stopping the instance, times will vary. Expect it to take from between 1 - 2 hours, dependent on Tenant workloads and if you have any PaaS services installed. I had to run the command again as things appeared to have stalled. It didn’t have any adverse affect.
Just a note on this: Although I ran the Start-AzureStack command again, I also ran the Test-AzureStack command in parallel. It reported that all tests passed, so go figure what was actually happening. I trust the tests, so use those as the gate to release the instance back into production.
All being well, once the Start-AzureStack command has completed, you'll have a fully operational system. You *could* resume normal operations and trust it's working. for peace of mind, I prefer to know that everything is working before letting it back into the wild.
The Test-AzureStack Cmdlet runs a number of tests that will give you the reassurance.
Run the command from the PEP and after a few minutes you should see a report on components that have passed or failed (hopefully not!)
Anything that doesn't pass, you're going to need to speak to Microsoft support :(
Remember to close the PEP session. Either:
Close-PrivilegedEndpoint -TranscriptsPathDestination '\\yourserver\share' -Credential (get-credential)