Creating an Azure Kubernetes Service (AKS) Cluster within a Virtual Network (VNET) using Terraform
Hello readers! Today, we're going to walk you through creating an Azure Kubernetes Service (AKS) cluster within a Virtual Network (VNET) in Azure world using Terraform.
Terraform is a popular Infrastructure as Code (IaC) tool that allows you to provision and manage resources in your cloud environment. AKS, on the other hand, is a managed container service that simplifies Kubernetes deployment and operations.
What are we going to create?
In this blog we will look at how to create following resources in Azure Cloud.
Resource group
Virtual Network (VNET)
Subnet
AKS Cluster with default system nodepool
Optionally create worker nodepools
Connect to the AKS Cluster and validate the functionality by installing
nginx
Helm Chart
We will skip going in detail on terraform modules as we have already covered those in detail in our blog Create EKS cluster within its VPC.
The complete terraform code for what we will discuss below is in this repository.
Prerequisites
Basic understanding of Azure, Terraform and Kubernetes.
An active Azure account. If you don't have one, you can create a free account and a azure subscription where you want create the resources in.
Azure CLI installed.
Terraform installed.
kubectl compatible with the AKS version you are installing.
terraform-docsif you want to auto-generate the documentation and tfswitch to manage multiple versions of terraform
helm a package manager for Kubernetes manifests, we will use it to install nginx helm chart once the cluster is created.
Modular structure
The following is the module structure we have used to structure the terraform modules. You can simply clone this repository with all the terraform manifests to create AKS cluster within its VNET.
Please refer the blog on how to Create EKS cluster within its VPC, to understand the terraform modularization and the terraform file structure below. We have explained in detail about main.tf
, variables.tf
, outputs.tf
and .tfvars
file.
my-aks-tf/ # root directory
.
├── cluster # scaffold module which invokes aks and vnet_and_subnets module
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── modules
│ ├── aks # module to create k8s cluster and worker nodepools
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ └── vnet_and_subnets # module to create resource group, vnet and subnet
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── main.tf # invokes cluster module to create aks cluster in its vnet
├── outputs.tf
├── sample.tfvars # sample variables file
└── variables.tf
Terraform Modules
The following are the terraform modules we will create in my-aks-tf
directory. You can refer the above section for the directory structure. We will look at respective terraform files below. Please note the terraform files may have been abbreviated for brevity, the complete code is available in this repository.
modules
These are the APIs created by the Platform team, these modules can also be separated out to its dedicated repository in real world and can be imagined as being used as reference by remote modules prepared by the users wanting to claim the infrastructure.
vnet_and_subnets
This is an opinionated module created by the Platform team to create an Azure Resource Group, Azure Virtual Network and Azure Subnet. Create following files under modules/vnet_and_subnets
directory.
main.tf
file below locks down the azure provider version we have validated this module with and also externalizes the vars like names
, address_space
and region
where the resources need to be created.
The following file may have been abbreviated for brevity. The complete working code can be found here
# setup azure terraform provider
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=3.65.0"
}
}
}
# azurerm_resource_group to create azure resource group
# official documentation https://registry.terraform.io/providers/hashicorp/azurerm/3.65.0/docs/resources/resource_group
resource "azurerm_resource_group" "az_rg" {
name = var.resource_group_name
location = var.region
tags = merge(var.tags, var.additional_resource_group_tags)
}
# azurerm_virtual_network to create the azure vnet in the azure resource group
# official documentation https://registry.terraform.io/providers/hashicorp/azurerm/3.65.0/docs/resources/virtual_network
resource "azurerm_virtual_network" "az_vnet" {
name = var.vnet_name
location = azurerm_resource_group.az_rg.location
resource_group_name = azurerm_resource_group.az_rg.name
address_space = var.address_space
tags = merge(var.tags, var.additional_vnet_tags)
}
# azurerm_subnet to create the azure subnet in the azure vnet
# official documentation https://registry.terraform.io/providers/hashicorp/azurerm/3.65.0/docs/resources/subnet
resource "azurerm_subnet" "az_subnet" {
name = var.subnet_name
resource_group_name = azurerm_resource_group.az_rg.name
virtual_network_name = azurerm_virtual_network.az_vnet.name
address_prefixes = var.subnet_address_prefix
service_endpoints = var.service_endpoints
}
The variables.tf
file mentions the variables being accepted as inputs from the user, which you can seeing being referred as var.
in the above main.tf
file.
The following variables.tf
may have been abbreviated for brevity. The complete working code can be found here.
variable "resource_group_name" {
type = string
description = "The Name for this Resource Group. Changing this forces a new Resource Group to be created."
}
variable "vnet_name" {
type = string
description = "The name of the virtual network. Changing this forces a new resource to be created."
}
variable "subnet_name" {
type = string
description = "The name of the subnet. Changing this forces a new resource to be created."
}
variable "region" {
type = string
description = "The location/region where the resource group. Changing this forces a new resource to be created. We will create the vnet and subnets in the same location/region where the resource group is."
}
variable "address_space" {
type = list(string)
description = "The address space that is used the virtual network. You can supply more than one address space but for our module implementation we are limiting it to 1 address space only."
default = ["10.1.0.0/16"]
validation {
condition = length(var.address_space) == 1
error_message = "Only a single address space can be set. Please check your subnet address prefixes."
}
}
variable "subnet_address_prefix" {
type = list(string)
description = "The address prefixes to use for the subnet. Currently only a single address prefix can be set as the Multiple Subnet Address Prefixes Feature is not yet in public preview or general availability."
default = ["10.1.0.0/16"]
validation {
condition = length(var.subnet_address_prefix) == 1
error_message = "Only a single address prefix can be set. Please check your subnet address prefixes."
}
}
variable "service_endpoints" {
type = list(string)
description = "The list of Service endpoints to associate with the subnet. Possible values include: Microsoft.AzureActiveDirectory, Microsoft.AzureCosmosDB, Microsoft.ContainerRegistry, Microsoft.EventHub, Microsoft.KeyVault, Microsoft.ServiceBus, Microsoft.Sql, Microsoft.Storage, Microsoft.Storage.Global and Microsoft.Web."
default = []
}
variable "tags" {
type = map(any)
description = "common tags to be assigned to all the resources"
default = {}
}
variable "additional_vnet_tags" {
type = map(any)
description = "additional tags for vnet"
default = {}
}
variable "additional_resource_group_tags" {
type = map(any)
description = "additional tags for resource group"
default = {}
}
The outputs.tf
file will output the necessary ids the user of this module might need to consume and probably use it as input to other modules. For example, we will need resource_group_name
and subnet_id
as input to aks
module below.
The following file may have been abbreviated for brevity. The complete working code can be found here.
output "az_rg_id" {
description = "The ID of the resource group"
value = azurerm_resource_group.az_rg.id
}
output "az_rg_name" {
description = "The ID of the resource group"
value = azurerm_resource_group.az_rg.name
}
output "az_vnet_id" {
description = "The ID of the vnet"
value = azurerm_virtual_network.az_vnet.id
}
output "az_subnet_id" {
description = "The ID of the subnet"
value = azurerm_subnet.az_subnet.id
}
aks
This is an opinionated module to create an AKS Cluster with a default nodepool along with optional ability to create more worker nodepools. Create following files under modules/aks
directory.
main.tf
below file below locks down the azure provider version we have validated this module with and also externalizes the vars like cluster_name
, k8s_version
, nodepools
config etc.. to create the AKS cluster.
The following file may have been abbreviated for brevity. The complete working code can be found here.
# setup azure terraform provider
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=3.65.0"
}
}
}
# azurerm_kubernetes_cluster to create k8s cluster
# official documentation https://registry.terraform.io/providers/hashicorp/azurerm/3.65.0/docs/resources/kubernetes_cluster
resource "azurerm_kubernetes_cluster" "k8s" {
name = var.cluster_name
location = var.region
resource_group_name = var.resource_group_name
dns_prefix = var.dns_prefix
kubernetes_version = var.k8s_version
node_resource_group = "aks_${var.cluster_name}_${var.region}"
tags = var.aks_tags
default_node_pool {
name = "system"
type = "VirtualMachineScaleSets"
node_count = 1
vm_size = "Standard_DS2_v2"
zones = [1, 2, 3]
vnet_subnet_id = var.az_subnet_id
only_critical_addons_enabled = true
node_labels = {
"worker-name" = "system"
}
}
identity {
type = "SystemAssigned"
}
network_profile {
network_plugin = var.network_plugin
}
# enable workload identity
oidc_issuer_enabled = true
workload_identity_enabled = true
}
# azurerm_kubernetes_cluster_node_pool to create k8s workers
# official documentation https://registry.terraform.io/providers/hashicorp/azurerm/3.65.0/docs/resources/kubernetes_cluster_node_pool
resource "azurerm_kubernetes_cluster_node_pool" "k8s-worker" {
for_each = var.nodepools
name = each.value.name
kubernetes_cluster_id = azurerm_kubernetes_cluster.k8s.id
vm_size = each.value.vm_size
min_count = each.value.min_count
max_count = each.value.max_count
enable_auto_scaling = each.value.enable_auto_scaling
enable_node_public_ip = each.value.enable_node_public_ip
zones = each.value.zones
vnet_subnet_id = var.az_subnet_id
tags = each.value.tags
node_labels = each.value.node_labels
}
variables.tf
file allows user to configure the subnet where the aks cluster and nodepools needs to be created along with configurations for the nodepools. These configurations are referred in main.tf
as var.
.
The following file may have been abbreviated for brevity. The complete working code can be found here.
variable "cluster_name" {
type = string
description = "aks cluster name"
}
variable "k8s_version" {
type = string
description = "kubernetes version"
default = "1.26"
}
variable "region" {
type = string
description = "azure region where the aks cluster must be created, this region should match where you have created the resource group, vnet and subnet"
}
variable "resource_group_name" {
type = string
description = "azure resource group name where the aks cluster should be created"
}
variable "dns_prefix" {
type = string
description = "DNS prefix specified when creating the managed cluster. Possible values must begin and end with a letter or number, contain only letters, numbers, and hyphens and be between 1 and 54 characters in length. Changing this forces a new resource to be created."
default = "platformwale"
}
variable "az_subnet_id" {
type = string
description = "azure subnet id where the nodepools and aks cluster need to be created"
}
variable "network_plugin" {
type = string
description = "Network plugin to use for networking. Currently supported values are azure, kubenet and none. Changing this forces a new resource to be created."
default = "azure"
}
variable "aks_tags" {
type = map(any)
description = "tags for the aks cluster"
default = {}
}
variable "nodepools" {
description = "Nodepools for the Kubernetes cluster"
type = map(object({
name = string
zones = list(number)
vm_size = string
min_count = number
max_count = number
enable_auto_scaling = bool
enable_node_public_ip = bool
tags = map(string)
node_labels = map(string)
}))
default = {
worker = {
name = "worker"
zones = [1, 2, 3]
vm_size = "Standard_D2_v2"
min_count = 1
max_count = 100
enable_auto_scaling = true
enable_node_public_ip = true
tags = { worker_name = "worker" }
node_labels = { "worker-name" = "worker" }
}
}
}
outputs.tf
will output the variables which may be useful to the end user. You may observe that we client_certificate
and kube_config
variables are marked as sensitive = true
, which prevents from printing any sensitive information on stdout, though it doesn't prevent it from being stored in tfstate
file.
The following file may have been abbreviated for brevity. The complete working code can be found here.
output "cluster_id" {
description = "The Kubernetes Managed Cluster ID."
value = azurerm_kubernetes_cluster.k8s.id
}
output "client_certificate" {
description = "Base64 encoded public certificate used by clients to authenticate to the Kubernetes cluster."
value = azurerm_kubernetes_cluster.k8s.kube_config.0.client_certificate
sensitive = true
}
output "kube_config" {
description = "Raw Kubernetes config to be used by kubectl and other compatible tools."
value = azurerm_kubernetes_cluster.k8s.kube_config_raw
sensitive = true
}
output "oidc_issuer_url" {
description = "The OIDC issuer URL that is associated with the cluster"
value = azurerm_kubernetes_cluster.k8s.oidc_issuer_url
}
output "node_resource_group" {
description = "The auto-generated Resource Group which contains the resources for this Managed Kubernetes Cluster."
value = azurerm_kubernetes_cluster.k8s.node_resource_group
}
output "node_resource_group_id" {
description = "The auto-generated Resource Group which contains the resources for this Managed Kubernetes Cluster."
value = azurerm_kubernetes_cluster.k8s.node_resource_group_id
}
cluster modules
In the sections above we have created the modules/APIs, it's time to invoke these modules in a consolidated module named cluster
. You can imagine this module being written by the client of the platform team which can be any application team wanting to claim infrastructure resources. This module will further be opinionated catering to the needs of the application team.
main.tf
below accepts cluster_name
as an input and uses the same name for resource_group_name
, vnet_name
, subnet_name
and cluster_name
. Similarly, same address_space
(cidr block) is used for both subnets and vnet. You can also observe that aks_with_node_group
module uses az_rg_name
(resource group name) and az_subnet_id
(subnet id) from vnet_with_subnets
modules output, this puts an indirect dependency on vnet_with_subnets
module. This means aks_with_node_group
module will wait for vnet_with_subnets
module to finish before executing.
The following file may have been abbreviated for brevity. The complete working code can be found here.
# invoking vnet and subnets modules
module "vnet_with_subnets" {
# invoke vnet_and_subnets module under modules directory
source = "../modules/vnet_and_subnets"
# create resource group, vnet and subnet with the same name as cluster name
resource_group_name = var.cluster_name
vnet_name = var.cluster_name
subnet_name = var.cluster_name
# location where the resources need to be created
region = var.region
address_space = var.address_space
subnet_address_prefix = var.address_space
}
# invoking aks module to create aks cluster and node group
module "aks_with_node_group" {
# invoke aks module under modules directory
source = "../modules/aks"
cluster_name = var.cluster_name
k8s_version = var.k8s_version
region = var.region
dns_prefix = var.cluster_name
resource_group_name = module.vnet_with_subnets.az_rg_name
az_subnet_id = module.vnet_with_subnets.az_subnet_id
nodepools = var.nodepools
}
variables.tf
file accepts less number of parameters than what we saw in vnet and aks modules earlier, as you can see above main.tf
is written in an opinionated manner catering to the needs of a team. Each team can write their own version of the module.
The following file may have been abbreviated for brevity. The complete working code can be found here.
variable "cluster_name" {
type = string
description = "resource group, vnet, subnet and aks cluster name"
}
variable "k8s_version" {
type = string
description = "kubernetes version"
default = "1.26"
}
variable "region" {
type = string
description = "azure region where the aks cluster must be created, this region should match where you have created the resource group, vnet and subnet"
}
variable "address_space" {
type = list(string)
description = "The address space that is used the virtual network. You can supply more than one address space but for our module implementation we are limiting it to 1 address space only."
default = ["10.1.0.0/16"]
}
variable "nodepools" {
description = "Nodepools for the Kubernetes cluster"
type = map(object({
name = string
zones = list(number)
vm_size = string
min_count = number
max_count = number
enable_auto_scaling = bool
enable_node_public_ip = bool
tags = map(string)
node_labels = map(string)
}))
default = {
worker = {
name = "worker"
zones = [1, 2, 3]
vm_size = "Standard_D2_v2"
min_count = 1
max_count = 100
enable_auto_scaling = true
enable_node_public_ip = true
tags = { worker_name = "worker" }
node_labels = { "worker-name" = "worker" }
}
}
}
outputs.tf
file is only retrieving the variables the team may need. The following file may have been abbreviated for brevity. The complete working code can be found here.
output "cluster_id" {
description = "The Kubernetes Managed Cluster ID."
value = module.aks_with_node_group.cluster_id
}
output "client_certificate" {
description = "Base64 encoded public certificate used by clients to authenticate to the Kubernetes cluster."
value = module.aks_with_node_group.client_certificate
sensitive = true
}
output "kube_config" {
description = "Raw Kubernetes config to be used by kubectl and other compatible tools."
value = module.aks_with_node_group.kube_config
sensitive = true
}
output "oidc_issuer_url" {
description = "The OIDC issuer URL that is associated with the cluster"
value = module.aks_with_node_group.oidc_issuer_url
}
output "node_resource_group" {
description = "The auto-generated Resource Group which contains the resources for this Managed Kubernetes Cluster."
value = module.aks_with_node_group.node_resource_group
}
output "node_resource_group_id" {
description = "The auto-generated Resource Group which contains the resources for this Managed Kubernetes Cluster."
value = module.aks_with_node_group.node_resource_group_id
}
output "az_rg_id" {
description = "The ID of the resource group"
value = module.vnet_with_subnets.az_rg_id
}
output "az_rg_name" {
description = "The name of the resource group"
value = module.vnet_with_subnets.az_rg_name
}
output "az_vnet_id" {
description = "The ID of the vnet"
value = module.vnet_with_subnets.az_vnet_id
}
output "az_subnet_id" {
description = "The ID of the subnet"
value = module.vnet_with_subnets.az_subnet_id
}
prepare to invoke the cluster module
Now we are at the final stage, where members of the team may want to invoke the cluster module for various use cases. For example, we may want to create dev
, stage
and prod
aks clusters.
main.tf
below is only overriding the cluster_name
, k8s_version
and region
vars in cluster
module we created above, and using other default values.
Along with that it's setting the terraform backend
to store the tfstate
file in s3. This backend is configured at the time of initializing using terraform init
in the section below. We have explained about this in our earlier blog on how to Create EKS cluster within its VPC.
This also configures the azure provider
, you will see in the section below that we are overriding the required parameters by setting some environment variables to make sure that terraform creates the resources in the desired azure account/subscription. You will also notice that cluster
module invocation is pointing to the source
cluster
module we created in the section above.
# to use s3 backend
# s3 bucket is configured at command line
terraform {
backend "s3" {}
}
provider "azurerm" {
# The AzureRM Provider supports authenticating using via the Azure CLI, a Managed Identity
# and a Service Principal. More information on the authentication methods supported by
# the AzureRM Provider can be found here:
# https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs#authenticating-to-azure
# The features block allows changing the behaviour of the Azure Provider, more
# information can be found here:
# https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/guides/features-block
features {}
}
# invoke cluster module which creates resource group, vnet, subnets and aks cluter with a default nodepool
# by default cluster module also creates a nodepool named worker
module "cluster" {
source = "./cluster"
region = var.region
cluster_name = var.cluster_name
k8s_version = var.k8s_version
}
variables.tf
file and outputs.tf
files are as follows. The actual files are here - variables.tf and outputs.tf.
variable "region" {
type = string
description = "aks region where the resources are being created"
}
variable "cluster_name" {
type = string
description = "aks cluster name, same name is used for resource group, vnet and subnets"
default = "platformwale"
}
variable "k8s_version" {
type = string
description = "k8s version"
default = "1.26"
}
output "kube_config" {
description = "Raw Kubernetes config to be used by kubectl and other compatible tools."
value = module.cluster.kube_config
sensitive = true
}
output "oidc_issuer_url" {
description = "The OIDC issuer URL that is associated with the cluster"
value = module.cluster.oidc_issuer_url
}
Now we also need to create .tfvars
file. You can imagine this as the input file used while invoking the module, this way you can have different behaviors based on your requirement. For example as discussed earlier, you may have dev.tfvars
, stage.tfvars
and prod.tfvars
for our environment specific clusters which may have distinguished configurations. The following is the sample.tfvars
which we will use in the sections below for provisioning the infrastructure. The complete code can be found here.
# azure region
region = "westus2"
# aks cluster name, this is the same name used to create the resource group as well as vnet
# hence this name must be unique
cluster_name = "platformwale"
With all these modules, now we are all set to actually see the infrastructure for AKS cluster come to live, please refer the sections below on further instructions.
Setting Up the Environment
Before we dive into creating our resources, let's authenticate Azure CLI with our Azure account:
az login
Set the following environment variables to prepare to create AKS cluster in a designated subscription in your azure account.
export ARM_CLIENT_ID="The Client ID which should be used."
export ARM_CLIENT_SECRET="The Client Secret which should be used."
export ARM_SUBSCRIPTION_ID="The Subscription ID which should be used."
export ARM_TENANT_ID="The Tenant ID which should be used."
Deploy and Validate
In this section we will look at the details on how to execute the terraform modules we prepared above to create the AKS cluster within its VNET using terraform, connect to the cluster and deploy nginx
helm chart to validate the functionality of the cluster.
- Create an s3 bucket to store the tfstate file
aws s3api create-bucket --bucket "your-bucket-name" --region "your-aws-region"
- Initialize terraform module, run this from the root of
my-aks-tf
where you have prepared the terraform files to invokecluster
module
# tfstate file name
tfstate_file_name="<some name e.g. aks-1111111111>"
# tfstate s3 bucket name, this will have the tfstate file which you can use for further runs of this terraform module
# for example to upgrade k8s version or add new node pools etc.. The bucket name must be unique as s3 is a global service. Terraform will create the s3 bucket if it doesn't exist
tfstate_bucket_name="unique s3 bucket name you created above e.g. my-tfstate-<myname>"
# initialize the terraform module
terraform init -backend-config "key=${tfstate_file_name}" -backend-config "bucket=${tfstate_bucket_name}" -backend-config "region=us-east-1"
- Retrieve the
terraform plan
, a preview of what will happen when you apply this terraform module. This is a best practice to understand the change.
terraform plan -var-file="path/to/your/terraform.tfvars"
# example
terraform plan -var-file="sample.tfvars"
- If you are satisfied with the plan above, this is the final step to apply the terraform and wait for the resources to be created. It will take about ~20 mins for all the resources to be created.
terraform apply -var-file="path/to/your/terraform.tfvars"
# example
terraform apply -var-file="sample.tfvars"
- After successful cluster creation, retrieve the
kubeconfig
, connect to the AKS cluster and validate thekubeconfig context
is now pointing to the new cluster.
az aks get-credentials --resource-group "<my resource group name>" --name "<my aks cluster name>" --subscription "<subscription where the resources are created>"
# as per the sample.tfvars parameters
az aks get-credentials --resource-group "platformwale" --name "platformwale" --subscription "${ARM_SUBSCRIPTION_ID}"
# validate that the kubeconfig context is pointing to the new cluster
kubectl config current-context
- Install
nginx
helm chart, this will create a load balancer service which proves the functionality of the AKS cluster as nginx pods were able to come up successfully.
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install -n default nginx bitnami/nginx
# validate nginx pod and load balancer service
kubectl get pods -n default
kubectl get svc -n default
# example output of the commands above
$ kubectl get pods -n default
NAME READY STATUS RESTARTS AGE
nginx-7c8ff57685-ck9pn 1/1 Running 0 3m31s
$ kubectl get svc -n default nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx LoadBalancer 10.0.80.50 XX.XXX.XXX.X 80:30149/TCP 77s
You will be able to put the http://<EXTERNAL-IP>:80
in browser and will be able to see nginx
welcome page as below -
Clean Up
When you're done with your resources, you can destroy them with following commands. This is extremely important step as otherwise you will see unexpected costs for the resources in your account.
# uninstall nginx helm chart to make sure load balancer is deleted
helm uninstall -n default nginx
# destroy infrastructure
terraform destroy -var-file="sample.tfvars"
Conclusion
There you have it! You've successfully created an AKS cluster within a VNET using Terraform. With the power of IaC, you can easily manage, replicate, and version control your infrastructure. Happy Terraforming!
References
Please note that this tutorial is a basic guide, and best practices such as state management, data security, and others are not covered here. We recommend further study to understand and implement these practices for production-level projects.
Author Notes
Feel free to reach out with any concerns or questions you have, either on the GitHub repository or directly on this blog. I will make every effort to address your inquiries and provide resolutions. Stay tuned for the upcoming blog in this series dedicated to Platformwale (Engineers who work on Infrastructure Platform teams).
Originally published at platformwale.blog on July 20, 2023