May 29th - Blog Subscription Service - Part 2: OIDC Auth and the Terraform Pipeline
Getting Terraform to run in CI without storing AWS credentials anywhere is one of those things that sounds complicated until you understand the pattern. This post covers how GitLab authenticates to AWS using OIDC, and how the CI pipeline is structured to plan on any branch and apply only on main.
The Problem with Static Credentials
The lazy approach to Terraform in CI is to create an IAM user, generate an access key, and paste it into a CI variable. It works, but it comes with real downsides:
- Long-lived credentials that can leak
- Manual rotation burden
- No automatic expiry
OIDC solves all of this. GitLab acts as an identity provider, and AWS grants temporary credentials to CI jobs that can prove they came from the right project and branch. The token expires when the job ends, nothing to rotate, and nothing to leak.
How OIDC Works Here
When a CI job runs, GitLab injects a short-lived JWT into $GITLAB_OIDC_TOKEN. That token is a signed assertion from GitLab that says "this job ran in project X, on branch Y, triggered by event Z." AWS STS accepts that token via AssumeRoleWithWebIdentity and returns temporary credentials scoped to the role.
The CI job writes the token to a file and sets an environment variable so the AWS SDK picks it up automatically:
before_script:
- echo $GITLAB_OIDC_TOKEN > /tmp/web-identity-token
- export AWS_WEB_IDENTITY_TOKEN_FILE=/tmp/web-identity-token
- export AWS_ROLE_ARN=$AWS_ROLE_ARN
- export AWS_REGION=us-east-1
Writing the token to a file rather than passing it as a shell argument keeps it out of process listings and CI logs. The AWS SDK exchanges it for short-lived credentials automatically on the first API call, no explicit sts assume-role-with-web-identity call needed.
Setting Up the Trust Relationship
1. Register GitLab as an OIDC provider in AWS
Done once via the AWS Console: IAM → Identity providers → Add provider → OpenID Connect.
- Provider URL:
https://gitlab.com - Audience:
sts.amazonaws.com
AWS validates well-known providers against its own root CA store, no manual thumbprint retrieval needed.
2. Create the IAM role
IAM → Roles → Create role → Web identity. The role is named your-role-name, scoped to the your-group/your-project project. The reference path is left blank, meaning any branch can assume the role. Restricting apply to main is handled in CI rules instead, which lets tf:plan run on feature branches to validate changes before merging.
The trust condition this generates:
gitlab.com:sub = project_path:your-group/your-project:ref_type:branch:ref:*
3. Scope permissions with a boundary and inline policy
The lazy part of my brain of course wanted to use AdministratorAccess, but we can do better. The role gets two things:
- Permission boundary (
your-role-name-boundary): a hard cap on what the role can ever do, even if someone attaches a broader policy later. Scoped to the exact services this project uses. - Inline policy (
your-inline-policy-name): the actual permissions needed to deploy: Lambda, API Gateway V2, DynamoDB, SES, EventBridge, ACM, Route53, CloudWatch Logs, IAM (scoped toyour-prefix-*roles only), and S3 (scoped to the state bucket).
The important thing about permission boundaries: both the boundary and the inline policy must allow an action for it to go through. When debugging, check both. I hit this multiple times: once when the boundary had a one character typo in the bucket name, and again when iam:CreateServiceLinkedRole was missing, both surfaced as 403 AccessDeniedException errors mid apply.
A few permissions that were missed until the first deploy (required in both policy and boundary):
events:ListTagsForResource: Terraform reads tags when refreshing existing EventBridge rulesiam:CreateServiceLinkedRole(scoped to API Gateway SLR): API Gateway needs a service-linked role on first use in a region- Several Lambda read permissions (
lambda:ListVersionsByFunction,lambda:GetFunctionCodeSigningConfig, etc.): all called by the AWS provider when refreshing existing Lambda state
The CI Pipeline
Jobs
tf:validate: runs automatically on any branch wheninfra/changes. Catches syntax errors cheaply before a plan.tf:plan: auto on MR pipelines and on main after merge wheninfra/changes (so the diff is visible in the MR UI before merging); manual on other branch pushes withinfra/changes (this manual plan could be automatic but I do stupid things and I don't want plans running for no reason).tf:apply: manual trigger, restricted tomainonly, wheninfra/changes. Depends on thetf:planartifact from the same pipeline; artifacts don't cross pipeline boundaries, so plan and apply must run in the same main pipeline. (I also plan to create some policy checks in the future that will enable automatic apply in certain, non-destructive situations.)
The plan produces a JSON report (terraform show -json tfplan > tfplan.json) uploaded as a reports: terraform artifact. GitLab (in theory) renders this as a summary in the MR UI showing resources to add, change, or destroy.
One gotcha worth noting: the Terraform Docker image sets its entrypoint to terraform, which breaks all non-terraform shell commands in before_script. The fix:
image:
name: hashicorp/terraform:1.15.2
entrypoint: [""]
Variables
Two variables stored in GitLab CI settings:
AWS_ROLE_ARN: the ARN of theyour-role-namerole. Not a credential, so it's fine unprotected, but set as masked.TF_VAR_route53_zone_id: the Route53 hosted zone ID formydomain.com. Also unprotected and masked.
Both must be unprotected because tf:plan runs on feature branches, and protected variables are only injected on protected branches. Restricting apply to main in CI rules is sufficient access control. There are ways to lock this down tighter but this should be sufficent to protect me, from myself.
The TF_VAR_ prefix is the clean way to pass Terraform variables through CI; Terraform picks them up automatically without any extra configuration.
What's Next
Part 3 goes deeper into the Terraform itself: the actual resources, how they're organized, and a few design decisions worth explaining.
- Part 1: The Why and the What
- Part 3: Terraform infrastructure deep dive
- Part 4: The Lambda Code
This is part of the Blog Subscription Service series.

Keep the coffee flowing