In this blog, we will learn about Azure governance and policy-driven guardrails to maintain control, security and compliance in cloud environments. This blog also covers the implementation of policy-driven guardrails with Terraform.

Introduction

Every IT environment has to deal with governance and control, whether it is in the cloud or on-premises. Cloud governance refers to the set of policies, procedures, and controls that an organisation puts in place to manage and regulate the use of cloud computing services. It is a critical aspect of cloud computing, as it helps organizations to maintain control, security, and compliance in their cloud environments.

Cloud governance typically involves the creation and enforcement of policies and procedures related to cloud service selection, deployment, configuration, management, and monitoring. It also covers areas such as data security, compliance, cost management, and risk management. Effective cloud governance requires collaboration between different stakeholders, including IT teams, security teams, compliance teams, and business stakeholders. It also requires a comprehensive understanding of the organisations cloud adoption strategy, business goals, and risk tolerance.

Microsoft Cloud Adoption Framework for Azure

The Microsoft Cloud Adoption Framework is a comprehensive guide to help organizations adopt and migrate to the cloud in a structured, efficient, and scalable manner. It is a collection of best practices, guidelines, and tools that provide organizations with a clear roadmap to plan, implement, and manage their cloud adoption journey.

Microsoft Cloud Adoption Framework for Azure

The framework covers a wide range of topics, including cloud governance, security, compliance, migration, and application modernisation. The Microsoft Cloud Adoption Framework is an essential resource for organizations that want to achieve a successful and efficient cloud adoption journey. It provides a structured approach to help organizations achieve their cloud goals, while minimising risk and maximising the benefits of the cloud.

Policy-driven Guardrails

Policy-driven guardrails are the policies and controls that organizations put in place to enforce compliance, security, and best practices in their cloud environments. Guardrails act as a form of guidance and constraint to ensure that resources are used in a controlled and secure manner, while allowing for flexibility and agility in the cloud environment. Guardrails typically cover areas such as cost management, security, compliance, and operational best practices. They are often implemented as a set of automated policies that help to detect and prevent unauthorised or non-compliant actions in the cloud environment. For example, a guardrail may prevent users from launching a virtual machine that does not meet certain security or compliance standards, or limit the use of certain cloud services to specific teams or departments.

Corporate Policy

When policies are defined as code, they can be automatically enforced. These policies use a declarative approach to specify the desired state of resources and detect any deviations from the defined policies. Policy-as-code is an approach to cloud governance that involves defining and enforcing policies using code rather than manual processes. It is a way of automating policy management and enforcement, making it easier and more efficient to ensure compliance and security in the cloud environment. Policy-as-code provides several benefits for cloud governance. It reduces the risk of human error and ensures consistent policy enforcement across the entire cloud environment. It also provides greater visibility and control over the cloud resources, making it easier to ensure compliance and security. Finally, policy-as-code can help to reduce the workload of IT teams by automating policy management and enforcement.

Implementing policy-driven guardrails with Terraform

In the upcoming steps, we will implement Azure Policy resources as policy-driven guardrails for our Azure environment. To ensure consistency and scalability, we will use code to deploy and manage these resources using Terraform. By using policy-as-code, we can enforce compliance with corporate standards and best practices for resource deployments in Azure, while also automating governance processes to minimize the risk of misconfigurations and security breaches. Through this approach, we can help ensure that our Azure environment is secure, compliant, and optimized for cost and performance.

When implementing Azure Policy, it is important to understand how Azure Policy effects work. Azure Policy effects determine how a policy is enforced on Azure resources. Each effect has a specific purpose and should be used in specific scenarios. Here is an explanation of each effect and when to use them:

  • Append: This effect adds the specified value to the existing property. It is useful when you want to metadata to a resource. For example, you can use an append effect to add allowed IPs to resources.
  • Audit: This effect generates a compliance report without taking any action on the resource. It is useful when you want to monitor the compliance of your resources without enforcing any specific policy.
  • AuditIfNotExists: This effect is similar to the audit effect, but it only generates a compliance report if the specified resource does not exist. It is useful when you want to ensure that a specific resource is deployed in your environment.
  • Deny: This effect denies the creation or update of a resource that violates the policy. It is useful when you want to enforce strict compliance on your resources.
  • DenyAction (preview): This effect is similar to the deny effect, but it allows you to specify an action to take when a violation occurs. For example, you can use a deny action to notify the resource owner or initiate an automation task to remediate the violation.
  • DeployIfNotExists: This effect deploys a resource if it does not exist. It is useful when you want to ensure that a specific resource is deployed in your environment.
  • Disabled: This effect disables the policy. It is useful when you want to temporarily suspend a policy without deleting it.
  • Manual (preview): This effect does not enforce the policy automatically. Instead, it generates a recommendation that a user can manually apply. It is useful when you want to provide users with guidance on best practices without enforcing strict compliance.
  • Modify: This effect modifies the value of a property. It is useful when you want to enforce a specific configuration on your resources. For example, you can use a modify effect to enforce a specific network security group on a virtual machine.

You should use the append effect when you want to add metadata to your resources, the audit effect to monitor compliance, the auditIfNotExists effect to ensure specific resources are deployed, the deny effect to enforce strict compliance, the denyAction effect to take actions on violations, the deployIfNotExists effect to ensure specific resources are deployed, the disabled effect to temporarily suspend a policy, the manual effect to provide users with guidance, and the modify effect to enforce specific configurations. To learn more about Azure policy effects, please refer to the official Microsoft documentation.

In addition to Azure Policy effects, it is also important to know how Azure Policy remediations work. Azure Policy remediations are a way to automatically correct non-compliant resources within your Azure environment based on policy definitions. When a policy violation is detected, a remediation task is triggered to automatically remediate the non-compliant resources, returning them to a compliant state. Here’s how Azure Policy remediations work:

  1. A policy definition is created and assigned to a specific scope, such as a subscription, resource group, or individual resource. The policy definition describes the rules that need to be enforced.
  2. Azure Policy continuously evaluates resources within the assigned scope for compliance with the policy definition. If a non-compliant resource is identified, the policy engine takes action to remediate it.
  3. When a policy violation is detected, a remediation task is triggered. The remediation task is a set of actions that will automatically correct the non-compliant resource. The actions can be customized, depending on the specific needs of the policy definition.
  4. Once the remediation task is triggered, it executes automatically. Azure Policy uses the resources' current configuration settings as a baseline to make the necessary changes to bring the resource into compliance with the policy definition.
  5. After the remediation task is complete, Azure Policy evaluates the resource again to confirm that it’s now compliant with the policy definition.

Azure Policy remediations provide an automated, scalable, and consistent way to maintain compliance across your Azure environment. By automating the remediation process, organizations can reduce the risk of policy violations and ensure that resources are always compliant with their policy definitions.

Prerequisites

Let’s get started with implementing policy-driven guardrails using Terraform. Building CI/CD pipelines will not be covered in this blog article. Before you can start deploying, there are some prerequisites to fulfill:

  • Azure tenant and user account: You’ll need an Azure tenant, an Azure Active Directory (Azure AD) instance. This instance is the foundation of the environment. And it allows you to create an identity (user account) to connect to Azure, set up the environment, and deploy the resources.
  • Subscription: You’ll need a subscription and owner permissions to deploy the resources.
  • Terraform: You’ll need the Terraform command-line interface to deploy and manage Azure resources. You can find more information about Terraform and the AzureRM provider in the documentation. Once you have fulfilled the prerequisites, we are ready to move forward and implement our policy-driven guardrails! We’ll cover the implementation through various examples. All of the examples below can be downloaded from this GitHub Repository.

Example 1: Resource Management

In this example we implement resource management policy-driven guardrails to enforce compliance with corporate standards and best practices for resource deployments. This example enforces deployment of resources to a specific region and using specific SKU’s, using the Deny effect. This may be important because deploying resources to specific regions can help ensure compliance with data sovereignty and regulatory requirements, while using specific SKU’s can help ensure consistent performance and cost optimization.

The west-europe-only.tf and allowed-vm-sizes.tf both deploy two resources: the Azure Policy definition, and the Azure Policy assignment. The Azure Policy definition is a set of rules or requirements that is created to enforce specific policies within an Azure environment. The policy definition includes details such as the name, description, and the rules that are used to evaluate resources for compliance.The Azure Policy assignment is the actual application of a policy definition to a specific scope within an Azure environment, such as a subscription or resource group. When an Azure Policy assignment is applied, it enforces the rules defined in the policy definition for all resources within the assigned scope. In other words, the policy definition is the blueprint for the policy, while the policy assignment is the actual application of the policy. You can have multiple policy assignments within a single policy definition, each with its own scope and parameters. This allows you to tailor the policy enforcement to specific areas of your Azure environment. Azure Policy definitions can be created and managed independently of any assignments. However, an Azure Policy assignment cannot exist without a policy definition.

The west-europe-only.tf file contains the Terraform code to create an Azure Policy definition that ensures all resources are deployed in the West Europe region, and assigns it to a scope in our subscription (specified by the scope parameter). This policy will deny resource deployments outside of the West Europe region.

The allowed-vm-sizes.tf file contains the Terraform code to create an Azure Policy definition that allows only the specifically defined virtual machine resources for deployment, and assigns it to a scope in our subscription. This policy will deny resource deployments of virtual machine sizes that are not included in the allowed list.

Example 2: Resource Tagging

In this example we implement resource tagging policy-driven guardrails to enforce the use of department, owner, and cost center tags. This provides a consistent way to organize and manage resources, making it easier to identify and manage costs, and track usage. Well-defined naming and metadata tagging conventions help to quickly locate and manage resources. These conventions also help associate cloud usage costs with business teams via chargeback and showback accounting mechanisms. By enforcing these policies, we ensure that resources are tagged consistently and accurately, reducing the risk of mismanagement and ensuring that resources are optimized for cost and performance. This can help improve the governance of the Azure environment, streamline management processes, and ensure that resources are aligned with business needs.

The resource-tagging.tf file contains the Terraform code to create an Azure Policy definition that will enforce that all resources in the subscription have the required tags (CostCenter, Owner, Department) applied, and if any of these tags is missing, the policy will modify the resource and add the tags with a timestamp as their value. We also create an Azure Policy assignment that assigns this policy to the subscription specified in the scope parameter.

Example 3: Encryption at Host

In this example we implement a policy-driven guardrail to checks whether virtual machines and virtual machine scale sets have encryption at host enabled. This may be imported for your organization to ensure information security, or being compliant with regulatory and/or industry benchmarks. When you enable encryption at host, data stored on the VM host is encrypted at rest and flows encrypted to the Storage service. If the virtual machines or virtual machine scale sets don’t have encryption at host enabled, the policy will be enforced with an Audit effect. The example can be found within the encryption-at-host.tf file. This file creates the Azure Policy definition, and the Azure Policy Assignment that assigns the “encryption-at-host” policy definition to the current subscription.

Example 4: Endpoint Protection

In this example we implement a guardrail to check whether virtual machines have endpoint protection installed. If the virtual machine doesn’t have endpoint protection installed, the policy will be enforced with an Audit effect. The example can be found within the endpoint-protection.tf file. This file creates the Azure Policy definition, and the Azure Policy Assignment that assigns the “endpoint-protection” policy definition to the current subscription.

Example 5: Guest Configuration

Guest Configuration policies ensure that all virtual machines deployed in Azure are compliant with the organization’s security and compliance requirements. It uses the Guest Configuration feature in Azure Policy to enforce configuration standards on virtual machines. Azure Guest Configuration policies enable you to audit and enforce configurations inside your virtual machines by evaluating settings against pre-defined policy definitions. In order to assess and enforce these configurations, Guest Configuration policies require a way to authenticate to the virtual machine and access its configuration settings. Managed identities in Azure provide a secure way to grant Azure services and resources access to other Azure resources. Specifically, managed identities can be used to authenticate to virtual machines without exposing credentials in code or configuration files. By using a managed identity, you can ensure that Guest Configuration policies have secure access to the virtual machine’s configuration data without exposing sensitive information. In this example, we will ensure that a system-assigned managed identity and the Guest Configuration extension is enforced on Linux virtual machines, so guest configuration capabilities can be used.

The vm-managed-identity.tf file contains the Terraform code to create a policy definition, policy assignment, role assignment, and policy remediation in Azure. This Terraform code automates the process of creating and assigning an Azure policy that adds a system-assigned managed identity to virtual machines hosted in Azure that are supported by Guest Configuration but do not have any managed identities, and then assigns the Contributor role to that managed identity.

The linux-vm-guest-configuration.tf file contains the Terraform code defines an Azure Policy definition that will deploy the Linux Guest Configuration extension to supported Linux virtual machines in Azure. If a virtual machine meets these criteria, the policy will take effect and deploy the Azure Policy for Linux extension. The deployment of the extension will be performed using an incremental deployment mode.

If you want to learn more about Guest Configuration, please refer to the Microsoft documentation, which contains several built-in policy samples. Specifically for applying guest configuration policies to a virtual machine with Terraform, please refer to the Terraform documentation.

Closing words

Policy-driven guardrails are a critical component of cloud governance and play a crucial role in ensuring the security, compliance, and optimisation of an organisation’s cloud environment. By automating policy enforcement and providing a consistent framework for policy management, policy-driven guardrails help organizations to mitigate the risks and challenges associated with cloud adoption.

The Microsoft Cloud Adoption Framework provides a comprehensive guide for organizations to plan, implement, and govern their cloud adoption journey. It emphasises the importance of governance throughout the entire cloud adoption process and provides best practices and tools for implementing policy-driven guardrails. To effectively implement policy-driven guardrails, organizations should consider a code-based (policy-as-code) approach. It is important to regularly review and update the guardrails to ensure that they remain relevant and effective.

To learn more about the topics that were covered in this blog article, refer to the links below:

Thank you for taking the time to go through this post and making it to the end. Stay tuned because we’ll keep continuing providing more content on topics like these in the future.