Creating A Modern Azure Diagnostics Policy That Generative AI Couldn't - Yet!
Background
Before diving into the detail I think it’s important to set the context here as all environments are different and every not everyone’s requirements are the same. This particular organisation had a large enterprise environment that was aligned to Microsoft’s Cloud Adoption Framework Landing Zone Design , so they had multiple Subscriptions under a number of well structured Management Groups and wanted to leverage Policy Driven Governance to ensure consistency.
A centralised Log Analytics Workspace had been deployed in a Management Subscription which was to be used for Azure Monitor, and would receive diagnostic logs for all available resources configured via their Diagnostic Settings.
Requirements
The requirements for the policy to collect diagnostic data for Azure Monitor were as follows:
- All available Logs must be collected using Category Groups
- All Category Groups must be enabled and the policy configured to allow for new Category groups being added in the future
- All Metrics must be collected
- All Diagnostic data must be sent to the centralised Log Analytics Workspace
- Where the resources support it, all diagnostic data must be sent to the ‘Resource Specific’ Log Analytics Destination Table
If you haven’t come across Resource Specific tables in Log Analytics before, they store data in dedicated tables which makes querying the data a lot simpler, and has the added benefit of being able to fine tune data retention values at the table level.
Engineering Approach
To see what was required to be configured in a policy, I decided to reverse engineer the solution by configuring the desired settings in the portal in my own Azure tenant, then viewing the resulting JSON. This is a technique I use a lot to automate the configuration of Azure resources, ensuring consistent results whether I’m configuring them via code or through the portal.
Below is an example of a Service Bus resource that has been configured to meet the requirements, and has the option selected to send logs to the ‘Resource Specific’ destination table in Log Analytics.
Once the settings had been configured, I clicked the JSON View link to display the resulting JSON.
You can see I have highlighted a few sections of the JSON that became relevant during the creation and testing process, so let me explain these further:
- API version - this is the version of the API the portal used to provision the diagnostic settings
- categoryGroup - there are separate Category Groups defined for both
allLogs
andaudit
, and thecategory
for these is defined and set tonull
- AllMetrics - this still uses
category
and there is nocategoryGroup
defined - logAnalyticsDestinationType - this is set to
Dedicated
The resulting JSON was formatted as follows:
{
"id": "/subscriptions/SUBSCRIPTION-GUID/resourceGroups/RESOURCEGROUP-NAME/providers/Microsoft.ServiceBus/namespaces/RESOURCE-NAME/providers/microsoft.insights/diagnosticSettings/DIAGNOSTIC-SETTING-NAME",
"name": "setByPolicy",
"properties": {
"logs": [
{
"category": null,
"categoryGroup": "allLogs",
"enabled": true,
"retentionPolicy": {
"days": 0,
"enabled": false
}
},
{
"category": null,
"categoryGroup": "audit",
"enabled": false,
"retentionPolicy": {
"days": 0,
"enabled": false
}
}
],
"metrics": [
{
"enabled": true,
"retentionPolicy": {
"days": 0,
"enabled": false
},
"category": "AllMetrics"
}
],
"workspaceId": "/subscriptions/SUBSCRIPTION-GUID/resourceGroups/RESOURCEGROUP-NAME/providers/Microsoft.OperationalInsights/workspaces/LOG-ANALTYICS-WORKSPACE-RESOURCE-NAME",
"logAnalyticsDestinationType": "Dedicated"
}
}
Checking For Existing Policies
I didn’t want to have to create a custom policy, and before going down this route it’s important to ensure there isn’t already a policy created by Microsoft that provides this functionality. To do this I used the excellent AzAdvertizer website. I chose this resource because it allowed me to search for policies based on their content as opposed to just their name.
My initial search is always based on the policy definition and I do this by entering a search string in the Policy definition compressed
column. This is great because any whitespace is removed from the JSON string which ensures you get all relevant results and avoid any human errors in the JSON definition.
I started by searching for "categoryGroup":"allLogs"
which brought back hundreds of results. I then filtered this down further by clicking the funnel filter icon selecting AND Contains
and entering "category":"AllMetrics"
. This brought back over 30 results so clicked the filter icon again and added another AND Contains
search for "logAnalyticsDestinationType":"Dedicated"
.
This returned no results at all. Suspecting I’d made a typo, or just got the search logic wrong, I tried searching just for policies that contained "logAnalyticsDestinationType":"Dedicated"
. Only 1 policy was returned.
At this point I knew I was going to have to create a custom policy to meet the requirements.
Calling AI - ‘Your Partner In Problem Solving’
They say that ‘a problem shared is a problem halved` so I decided to try and get AI to generate me a policy I could use instead of creating one from scratch.
Now before I dive into how I used AI for this, I have to state I am a big fan of Generative AI. It can save a lot of time and effort and really help enhance solutions. However it is not always right, and you should always validate and test its output.
There is a lot of hype around AI and Prompt Engineering at present and I don’t want to make this post all about that, so I will just share the prompt I used with Microsoft Copilot, ChatGPT & Google Gemini.
Act as an Azure Expert.
You have a requirement to ensure “Diagnostic Settings” are configured on all Azure “Service Bus” instances.
You need to ensure the “Category Groups” for “allLogs”, “audit” and “AllMetrics” are selected to “Send to Log Analytics Workspace” and use the “Resource Specific” destination table.
Please generate a policy definition to achieve this. The policy must check that the correct log analytics workspace ID is configured and not just that category groups are enabled.
The ID of the log analytics workspace will be provided by the user when assigning the policy at the Management Group level so should not be specified in the definition.
Any remediation should be carried out using the built in “Log Analytics Contributor” role to configure the settings, custom roles are not to be used.
Before generating the policy definition please ask any questions that will help you produce a more accurate response.
Now there are a couple of things to note from this prompt. Firstly I made an error in stating “AllMetrics” is a Category Group, it is a Category as we have seen before. Secondly I added an extra requirement to ensure the policy checks that the correct Log Analytics Workspace is configured. I did this after inspecting existing policies and discovered they check that Logs and Metrics are enabled, but don’t validate where they are sending them to which seemed pointless to me.
All 3 Bots asked similar questions and came pretty close to giving me what I needed. However none were 100% accurate.
The Policy Definition
I took the base policy definition Copilot generated and stripped it back to only include only the information I was interested in. I also didn’t want any logic defined in the policy I didn’t fully understand. I won’t go through all the settings that were incorrect, but I will now break down the sections of my final working policy definition further.
The properties
Object
This is fairly self explanatory and contains the following:
- displayName - the name of the policy displayed in the Azure portal
- policyType - the type of the policy which is always
Custom
for policies not created by Microsoft - mode - this is set to
All
as per Microsoft’s recommendation and specifies that the policy applies to all resource types - description - the description of the policy as displayed in the Azure portal
- metadata - the metadata object stores information about the policy definition
- parameters - the parameters object stores and validates information passed to the policy when it is assigned and can be used to set default values
- policyRule - the policyRule object is the conditional logic that needs to be matched for the policy to apply
{
"properties": {
"displayName": "Deploy Diagnostic Settings - Service Bus - Azure Monitor",
"policyType": "Custom",
"mode": "All",
"description": "Configures diagnostic logs for the category groups 'allLogs' and 'audit' as well as the category 'AllMetrics' for a Service Bus resource to send to a Log Analytics workspace for use with Azure Monitor. The diagnostic logs are sent to the 'Resource Specific' Destination Table. NOTE: The identity carrying out remediation tasks requires the 'Log Analytics Contributor' role to be assigned on both the resource and the Log Analytics Workspace.",
"metadata": {
},
"parameters": {
},
"policyRule": {
}
}
}
The main focus of this post will be on the policyRule
object as the metadata
and parameters
objects are kept simple.
The metadata
Object
The metadata
object stores extra data about the policy definition that you want to add in addition to Azures default values. Here we are just setting the version
and the category
:
"metadata": {
"version": "1.0.0",
"category": "Monitoring"
}
The parameters
Object
The parameters
object makes policy definitions more flexible and reusable as they allow you to define values that can be customised when the policy is assigned. Here we are setting the logAnalytics
Workspace Resource ID.
"parameters": {
"logAnalytics": {
"type": "String",
"metadata": {
"displayName": "Log Analytics Workspace Resource ID",
"description": "Specifies the resource ID of Log Analytics workspace to send logs to."
}
}
}
The policyRule
Object
The policy rule object is made up of if
and then
objects. In the if
object you set conditional logic that determines when the policy should be applied. In the then
object you specify the action or effect that occurs when the conditions in the if
object are true.
"policyRule": {
"if": {
},
"then": {
}
}
The if
Object
The if
object specifies conditional logic that needs to be met in order for the policy to apply. In this case, the type
of resource needs to match "Microsoft.ServiceBus/namespaces"
.
"if": {
"field": "type",
"equals": "Microsoft.ServiceBus/namespaces"
}
The then
Object
The then
object is what happens when the if
condition has been met. It contains the following:
- effect - the effect determines what action is taken if the compliance criteria are not met. In this case, the
effect
is set todeployIfNotExists
which will automatically deploy an Azure Resource Manager (ARM) template - details - the details object provides additional information and parameters
"then": {
"effect": "deployIfNotExists",
"details": {
"type": "Microsoft.Insights/diagnosticSettings",
"existenceCondition": {
},
"roleDefinitionIds": [
],
"deployment": {
}
}
}
The details
Object
For this example the details
object contains the following:
- type - the type of resource that the policy will be targeting
- existenceCondition - the compliance criteria that must be met
- roleDefinitionIds - a list of RBAC role(s) required to be assigned to perform the deployment
- deployment - the ARM template to be deployed
The existenceCondition
Object
The existenceCondition
object is key, as this is where the compliance of the policy will be determined. Think of this as all the things that need to be true for the policy to be considered compliant. This section needs extra consideration as it evaluates conditional logic for both deployment and compliance. I say this because after creating the policy and watching it successfully apply, it was marked as non compliant.
Let me explain further.
I initially used the logic from the JSON created when the settings were configured via the portal which has separate objects for the allLogs
and audit
category groups. Part of the existenceCondition
object looked like this:
"existenceCondition": {
"allOf": [
{
"field": "Microsoft.ServiceBus/namespaces/diagnosticSettings[*].logs[*].categoryGroup",
"contains": "allLogs"
},
{
"field": "Microsoft.ServiceBus/namespaces/diagnosticSettings[*].logs[*].categoryGroup",
"contains": "audit"
}
]
}
From reading this I can see that categoryGroup
needs to contain both allLogs
and audit
to be considered compliant, and this works perfectly for deployment.
However when I checked the compliance of the policy I discovered it used different logic to check these settings.
I wasn’t sure how to fix this one so I turned to Copilot again and asked:
Act as an Azure Expert.
You have created a policy as defined below but compliance fails as it sees properties.logs[].categoryGroup’ as ‘[“allLogs”,“audit”]’ the definition is below, how do I fix this?*
Copilot replied with the following:
The issue you’re facing is likely due to the way the policy is checking for the categoryGroup
in the existenceCondition
. The policy is currently checking for allLogs
and audit
separately, but the compliance check is seeing them together as ["allLogs","audit"]
.
To fix this, you can modify the existenceCondition
to check if both allLogs
and audit
are in the categoryGroup
array. Here’s how you can modify it:
"existenceCondition": {
"allOf": [
{
"equals": "true",
"field": "Microsoft.Insights/diagnosticSettings/logs.enabled"
},
{
"in": ["allLogs", "audit"],
"field": "microsoft.insights/diagnosticSettings/logs[*].categoryGroup"
},
{
"equals": "true",
"field": "Microsoft.Insights/diagnosticSettings/metrics.enabled"
},
{
"field": "Microsoft.Insights/diagnosticSettings/workspaceId",
"matchInsensitively": "[parameters('logAnalytics')]"
},
{
"equals": "Dedicated",
"field": "Microsoft.Insights/diagnosticSettings/logAnalyticsDestinationType"
}
]
}
In this modification, I replaced the separate checks for allLogs
and audit
with a single check using the in
operator. This checks if the categoryGroup
array contains both allLogs
and audit
.
Please try this modification and see if it resolves the compliance issue. If the issue persists, please let me know so we can investigate further.
I implemented the changes as I understood exactly what it was suggesting, and sure enough compliance was happy. Go Copilot!!!
You will notice from the JSON above that there are additional settings in the existenceCondition
object that I supplied to Copilot to meet our requirements:
- There is a check to ensure that both
logs
andmetrics
are enabled - There is a check to ensure that the configured
workspaceId
matches thelogAnalytics
parameter specified when the policy is assigned - The
logAnalyticsDestinationType
is set toDedicated
which we saw from the JSON the portal produced earlier
The roleDefinitionIds
Object
The roleDefinitionIds
object is a list of RBAC roles that the identity performing the remediation will need to be assigned for a successful deployment to take place. In this case there is only one role that is required, which is Log Analytics Contributor
and this is specified by its ID of 92aaf0da-9dab-42b6-94a3-d43ce8d16293
"roleDefinitionIds": [
"/providers/Microsoft.Authorization/roleDefinitions/92aaf0da-9dab-42b6-94a3-d43ce8d16293"
]
The deployment
Object
The deployment
object is the ARM template that is deployed when the effect
is set to deployIfNotExists
. I’ve used ARM templates in the past and can understand them, but don’t consider myself an expert by any means. However I was able to use some of the JSON generated by Copilot as starter code and modify it accordingly:
"deployment": {
"properties": {
"mode": "Incremental",
"template": {
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"name": {
"type": "string"
},
"logAnalytics": {
"type": "string"
}
},
"resources": [
{
"type": "Microsoft.ServiceBus/namespaces/providers/diagnosticSettings",
"name": "[concat(parameters('name'), '/Microsoft.Insights/setByPolicy')]",
"apiVersion": "2021-05-01-preview",
"properties": {
"workspaceId": "[parameters('logAnalytics')]",
"logAnalyticsDestinationType": "Dedicated",
"logs": [
{
"categoryGroup": "allLogs",
"enabled": true
},
{
"categoryGroup": "audit",
"enabled": true
}
],
"metrics": [
{
"category": "AllMetrics",
"enabled": true
}
]
}
}
]
},
"parameters": {
"name": {
"value": "[field('name')]"
},
"logAnalytics": {
"value": "[parameters('logAnalytics')]"
}
}
}
}
The key changes I had to make as I went along were in the resources
object where I changed the following:
- The
name
was updated tosetByPolicy
to align with other diagnostic settings that had been deployed via policies elsewhere within the tenant - The
apiVersion
was updated to match the version that was used in the portal when configuring the settings as this version introduced support for thecategoryGroup
attribute - The
logAnalyticsDestinationType
was set toDedicated
- In the list of
logs
I had to updatecategory
tocategoryGroup
Putting It All Together
When comparing the final JSON file to the content in this post you may notice a few of the fields appear in a different order to that shown earlier. This is because I like any JSON I write to be as declarative as possible. JSON is designed to be read by machines, not humans, so it doesn’t care what order values are specified within an object as long as it is syntactically correct. A quick example here would be below where the equals
is before the field:
{
"equals": "true",
"field": "Microsoft.Insights/diagnosticSettings/logs.enabled"
}
This is technically the same as how I would write it, but the below makes more sense to me:
{
"field": "Microsoft.Insights/diagnosticSettings/logs.enabled",
"equals": "true"
}
The finished working policy definition JSON file is below, so feel free to modify it and use within your own environments, after testing of course!
{
"properties": {
"displayName": "Deploy Diagnostic Settings - Service Bus - Azure Monitor",
"policyType": "Custom",
"mode": "All",
"description": "Configures diagnostic logs for the category groups 'allLogs' and 'audit' as well as the category 'AllMetrics' for a Service Bus resource to send to a Log Analytics workspace for use with Azure Monitor. The diagnostic logs are sent to the 'Resource Specific' Destination Table. NOTE: The identity carrying out remediation tasks requires the 'Log Analytics Contributor' role to be assigned on both the resource and the Log Analytics Workspace.",
"metadata": {
"version": "1.0.0",
"category": "Monitoring"
},
"parameters": {
"logAnalytics": {
"type": "String",
"metadata": {
"displayName": "Log Analytics Workspace Resource ID",
"description": "Specifies the resource ID of Log Analytics workspace to send logs to."
}
}
},
"policyRule": {
"if": {
"field": "type",
"equals": "Microsoft.ServiceBus/namespaces"
},
"then": {
"effect": "deployIfNotExists",
"details": {
"type": "Microsoft.Insights/diagnosticSettings",
"existenceCondition": {
"allOf": [
{
"field": "Microsoft.Insights/diagnosticSettings/logs.enabled",
"equals": "true"
},
{
"field": "microsoft.insights/diagnosticSettings/logs[*].categoryGroup",
"in": [
"allLogs",
"audit"
]
},
{
"field": "Microsoft.Insights/diagnosticSettings/metrics.enabled",
"equals": "true"
},
{
"field": "Microsoft.Insights/diagnosticSettings/workspaceId",
"matchInsensitively": "[parameters('logAnalytics')]"
},
{
"field": "Microsoft.Insights/diagnosticSettings/logAnalyticsDestinationType",
"equals": "Dedicated"
}
]
},
"roleDefinitionIds": [
"/providers/Microsoft.Authorization/roleDefinitions/92aaf0da-9dab-42b6-94a3-d43ce8d16293"
],
"deployment": {
"properties": {
"mode": "Incremental",
"template": {
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"name": {
"type": "string"
},
"logAnalytics": {
"type": "string"
}
},
"resources": [
{
"type": "Microsoft.ServiceBus/namespaces/providers/diagnosticSettings",
"name": "[concat(parameters('name'), '/Microsoft.Insights/setByPolicy')]",
"apiVersion": "2021-05-01-preview",
"properties": {
"workspaceId": "[parameters('logAnalytics')]",
"logAnalyticsDestinationType": "Dedicated",
"logs": [
{
"categoryGroup": "allLogs",
"enabled": true
},
{
"categoryGroup": "audit",
"enabled": true
}
],
"metrics": [
{
"category": "AllMetrics",
"enabled": true
}
]
}
}
]
},
"parameters": {
"name": {
"value": "[field('name')]"
},
"logAnalytics": {
"value": "[parameters('logAnalytics')]"
}
}
}
}
}
}
}
}
}
I covered how to add the policy definition using PowerShell in my previous post so head over there to check that out.