Skip to main content

Azure Portal CPU Graph Bug or Feature?

I support an Azure PaaS application.

We had a brief outage recently.

Given that the code had not been changed in a month, we suspected some maintenance in an Azure data center stepped on our application.

Ping tests and self-tests failed for approximately 10 minutes.

The outage resolved on its own without intervention.

I submitted a ticket to Azure Support to determine the cause of the outage but the reason I'm writing this post is because of the behavior I observed with the CPU graphs for Cloud Services while investigating the outage.

The CPU graphs show different results depending on the time range selected.

I would expect to see the CPU spike with the same value no matter what time range I selected. But, to see the spike that fired the alert, I had to to "Edit" the chart and select different time ranges to see the differences. It wasn't until I selected a narrow custom time range that the CPU graph would display the CPU spike that corresponded to the alert firing. The alert fires if the CPU percentage exceeds 80% over 15 minutes. So, if you "know" something happened, try different time ranges but especially the custom range to find what you are looking for.

This behavior has been documented and forwarded to the Azure portal team for review. It appears in both the Classic and current Azure portal.

The response from Azure Support when I raised this concern.
"I have had discussion with our Azure UI team and Azure Monitoring team regarding the portal graph.

As they mentioned, When we look at the 24 hours of data in the portal, the data is aggregated at 1 hour granularity and the average is shown. Similar is the case for 1 week of data shown on the portal. Since the spike exist for 5 to 10 minutes, we need to see the custom data option instead of using the 24 hour and 1 week. These 24 hours/ 1 week graph will be helpful when you have spike for more than an hour."

The CPU spikes are lower in the graphs that have a longer time range because of the aggregation and averaging. This is not a bug with the Azure graphs, it's a feature. ;-)








Comments

Popular posts from this blog

Modifying Endpoint URLs on Availability Group Replicas

I recently had to modify the Endpoint URLs on our SQL Server Availability Group replicas.  The reason for this blog post is that I could not answer the following questions: Do I need to suspend data movement prior to making this change?  Would this change require a restart of the database instance? I spent enough time searching on my own to no avail that I tossed the question to the #sqlhelp hashtag on Twitter and Slack but didn't get an answer prior to executing the change request. After reading the relevant documentation, I think it's probably a good idea to suspend data movement for this change. The T-SQL is straightforward.  USE MASTER GO ALTER AVAILABILITY GROUP [AG1]  MODIFY REPLICA ON 'SQL2012-1' WITH (ENDPOINT_URL = 'TCP://10.10.10.1:5022'); ALTER AVAILABILITY GROUP [AG1]  MODIFY REPLICA ON 'SQL2012-2' WITH (ENDPOINT_URL = 'TCP://10.10.10.2:5022'); ALTER AVAILABILITY GROUP [AG2]  MODIFY REPLICA ON 'SQL2012-1...

Set Azure App Service Platform Configuration to 64 bit.

If you need to update several Azure App Services' Configuration to change the Platform setting from 32 bit to 64 bit under Configuration | General settings, this script will save you about six clicks per service and you won't forget to press the SAVE button. Ask me I know. 🙄 Login-AzureRmAccount Set-AzureRmContext  -SubscriptionName  "Your Subscription" $ResourceGroupName  =  'RG1' ,  'RG2', 'RG3' foreach  ( $g   in   $ResourceGroupName ) {       # Set PROD slot to use 64 bit Platform Setting      Get-AzureRmWebApp  -ResourceGroupName  $g  | Select Name |  %  {  Set-AzureRmWebApp  -ResourceGroupName  $g  -Name  $_ .Name  -Use32BitWorkerProcess  $false  }       # Set staging slot to use 64 bit Platform setting ...