Drift Detection and Performance in Terraform

Drift means when state of your infrastructure differs from what is defined in terraform configuration files. Reasons for this this drift could be many, one is - changing the infrastructure manually thru cloud provider portal. To understand how Terraform deal with this kind of drifts and to maintain the consistency with the terraform configuration, let's go through few capabilities of Terraform.

Terraform State

As per Terraform documentation, The primary purpose of this file is to store the bindings between a remote system and resource instances declared in the terraform configuration files. Terraform state file can be stored locally and remotely both. which can be defined using the provider backend under terraform block.

While writing the terraform code from scratch, you will get the state file (Local Name - terraform.tfstate) after execution of "terraform apply" command.

Drift Detection

Prior to execution of plan, apply and destroy commands, terraform executes refresh command to get the actual state of infrastructure by making API calls to respective cloud provider.

Terraform Refresh

1. When you execute "terraform refresh" command, it does modify the state file but this does not modify infrastructure. If the state is changed, this may cause changes to occur during the next plan or apply.

2. This can be used to detect any drift from the last-known state, and to update the state file. So, If the state has drifted from the last time Terraform ran, refresh allows that drift to be detected.

Resource Lifecycle

Terraform provides some lifecycle configuration options for every resource, regardless of provider, that give you more control over how Terraform reconciles your desired configuration against state when generating plans.

The lifecycle configuration block allows you to set three different flags which control the lifecycle of your resource.

prevent_destroy: When this is set to true, any plan that includes a destroy of this resource will return an error message. This flag provides extra protection against the destruction of a given resource.

ignore_changes: This flag tells Terraform that which individual attributes to ignore when evaluating changes.

create_before_destroy: This flag is used to ensure the replacement of a resource is created before the original instance is destroyed. This is used particularly for achieving zero down time.

resource "resource_type" "resource_local_name" {
  count  = 2
  ...
  lifecycle {
    prevent_destroy       = true
    ignore_changes        = ["ami"]
    create_before_destroy = true
  }
}

Note: The goal of a terraform plan is to compare the configuration file against the current state file and read any outputs related to the current figuration. While a terraform plan does perform a terraform refresh by default, but terraform plan does not actually result in changes to the state file.

Scenario 01

While executing terraform plan, if someone used the flag "-refresh=false", Terraform won't fetch the latest changes of your real world infrastructure and will only compare the state file with terraform configuration files and terraform will not detect any drift between your real word infrastructure and actual terraform configuration.

Scenario 02

While executing terraform apply, if someone used the flag "-refresh=false", Terraform will only create the resources which are newly added in configuration files by doing a comparison between current state file and configuration files. Terraform won't detect any drift between your real world infrastructure and actual terraform configuration as you have disabled the synchronisation of resources.

Note: After Terraform 13.0, you will no longer be able to disable synchronisation of data resources. Though you can continue to disable synchronisation of managed resources declared with resource blocks using "refresh=false" flag.

Scenario 03

As we know, Every time plan, apply or destroy is called, terraform refresh the deployment state. It makes multiple call to the cloud provider API to verify which resources exist, which don't, and refresh the data sources.

For larger infrastructures, querying every resource is too slow. Many cloud providers do not provide APIs to query multiple resources at once, and the round trip time for each resource is hundreds of milliseconds.

That's why, People sometimes use "-refresh=false" flag with "terraform destroy" in order to make the destroy operation faster. In these scenarios, the cached state is treated as the record of truth.

Performance Improvement

Resource Targeting

The -target option can be used to focus Terraform's attention on only a subset of resources.

This targeting capability is provided for exceptional circumstances, such as recovering from mistakes or working around Terraform limitations. It is not recommended to use -target for routine operations, since this can lead to undetected configuration drift and confusion about how the true state of resources relates to configuration.

Resource Graph

Terraform builds a dependency graph from the Terraform configurations, and walks this graph to generate plans, refresh state, and more. To walk the graph, a standard depth-first traversal is done. Graph walking is done in parallel: a node is walked as soon as all of its dependencies are walked.

In order to reduce the time it takes to provision resources, Terraform uses parallelism. By default, Terraform uses a default concurrency of 10 parallel runs on the plan, apply, and destroy commands. Which means, up to 10 nodes in the graph will be processed concurrently.

Setting "-parallelism" is considered an advanced operation and should not be necessary for normal usage of Terraform and can be used in certain use cases.

References

https://www.terraform.io/docs/internals/graph.html#walking-the-graph
https://www.terraform.io/docs/enterprise/system-overview/capacity.html
https://www.terraform.io/docs/language/state/purpose.html#performance
https://www.hashicorp.com/blog/detecting-and-managing-drift-with-terraform#Summary
https://www.terraform.io/docs/cli/commands/plan.html#resource-targeting

Techreads

Search This Blog