GitHub Actions Automation Workflows

This page explains the GitHub Actions workflows used by the repository. These workflows build containers, validate infrastructure changes, and deploy platform resources by using passwordless sign-in to Azure through OpenID Connect.

Workflow Overview

Workflow	Trigger	Purpose
.builds.yml	Called by other workflows	Reusable container build workflow for azure-proxy images
.deployer.yml	Called by other workflows	Reusable Terraform deployer for `initial-setup/infra` (tools bootstrap and module-level operations)
.deployer-using-secure-tunnel.yml	Called by other workflows	Reusable Terraform deployer for `infra-ai-hub` through Chisel + Privoxy secure tunnel
.lint.yml	Called by other workflows	Reusable validation: pre-commit (`terraform fmt` + `tflint`), conventional commits, fork check
add-or-remove-module.yml	Manual (workflow_dispatch)	Deploy or destroy selected tools modules (bastion, azure_proxy, jumpbox, github_runners_aca)
manual-dispatch.yml	Manual (workflow_dispatch)	Run `plan/apply/destroy` for dev/test/prod; prod apply requires a semver tag (e.g. `v1.2.3`) and creates a GitHub Release
merge-main.yml	Push to `main`	Automatic post-merge: semantic version tag via conventional commits, apply infrastructure to test
pr-open.yml	Pull request events + manual trigger	PR validation: lint, container builds, deploy proxy in tools, and plan against test
schedule.yml	Cron (daily at 5 PM PST)	Auto-destroy Bastion for cost savings
pages.yml	Push to main (docs and Terraform roots) + manual trigger	Generate docs and deploy GitHub Pages site

Tenant Onboarding Portal Workflows

The tenant onboarding portal now keeps Terraform in tenant-onboarding-portal/infra while the application code lives under tenant-onboarding-portal/backend and tenant-onboarding-portal/frontend. GitHub Actions provisions App Service from the root infra/ folder, builds the frontend separately, copies the SPA into the backend deployment bundle, and deploys the backend package to App Service via zip deployment.

Workflow	Portal-specific behavior
pr-open.yml	Detects changes anywhere under `tenant-onboarding-portal/`, provisions a preview App Service in tools, builds `frontend/` and `backend/`, then deploys the backend zip for PR validation.
pr-close.yml	Destroys the preview portal environment created for the PR after the pull request is closed.
merge-main.yml	On merges to `main`, detects portal changes, provisions the shared tools App Service if needed, builds the frontend and backend, deploys to the staging slot, health-checks it, then swaps staging into production.
portal-deploy.yml	Manual tools redeploy for the same backend/frontend bundle, following the same provision, staging deploy, health check, and slot-swap path as the automated main deployment.

Developer SDLC Flow (Branch → PR → TEST → PROD)

The repository enforces a promote-through-environments workflow. Every change flows through a validated path before reaching production.

Create feature branch from main (e.g. feat/<work-item>, fix/<issue>) and commit changes.
Open PR to main, which triggers pr-open.yml:
- Lint (pre-commit: terraform fmt + tflint, conventional commit title, fork check)
- Container image builds
- Terraform plan against test (via secure tunnel) — summary appended to PR description
Merge to main once PR checks and code review pass.
Auto-apply to test + semver tag: merge-main.yml starts two jobs concurrently — semantic versioning (conventional commit history → v1.2.3 tag + CHANGELOG.md update) and proxy bootstrap in tools. Container images are then re-tagged with the semver version. Once the proxy is up, it applies infrastructure to test through the Chisel tunnel. After a successful apply, integration tests run automatically (two phases — see below).
Promote to prod: Use manual-dispatch.yml, select prod + apply, and provide the semver tag. The workflow verifies all container images exist with that tag, then deploys using the semver-tagged images (not latest). This is gated — requires approval from designated reviewers (configured in the GitHub prod environment protection rules).
Release created: After successful prod apply, a GitHub Release is automatically created from the deployed tag with deployment metadata.

PROD is gated: The prod environment has required reviewers configured as a GitHub Environment protection rule. Any workflow job targeting prod will pause and wait for manual approval before executing. Only designated reviewers can approve.

Developer branch testing (occasional): Developers can run manual dispatch to dev from a feature/PR branch for targeted validation; use this sparingly and co-ordinate with other devs. The dev environment does not include App Gateway or DNS Zone — full ingress path validation is only available in test and prod.

Feature Branch
   │
   └── Pull Request ──► pr-open.yml
                        ├─ .lint.yml (fmt, tflint, conventional commits, fork check)
                        ├─ .builds.yml
                        ├─ .deployer.yml (tools/azure_proxy)
                        ├─ .deployer-using-secure-tunnel.yml (test plan)
                        └─ Update PR description with plan summary
                                  │
                                  ▼
                             Merge to main
                                  │
                                  ▼
                        merge-main.yml
                        ├─ Semantic version tag (v1.2.3) ─────────────────── concurrent
                        ├─ Tag container images (latest → v1.2.3) ──────── after semver
                        ├─ Deploy azure_proxy (tools) ───────────────────────────────┤
                        └─ Apply to test (via Chisel tunnel)
                             └─ Integration tests (post-apply)
                                  ├─ Direct: pytest group=direct
                                  ├─ AI evaluation (optional, judge model configured)
                                  └─ Via proxy: pytest group=proxy (KV private endpoint)
                                  │
                                  ▼
                        manual-dispatch.yml (prod + apply + tag)
                        ├─ ⏸ Requires prod environment approval
                        ├─ Verify container images exist for semver tag
                        ├─ Apply to prod using semver-tagged images
                        └─ Create GitHub Release

Advanced Branching Patterns

The basic SDLC flow above covers the single-feature-per-PR path. In practice, teams often need to coordinate dependent work or bundle multiple features into a single release. Two patterns handle this: Stacked PRs and Release PRs.

Stacked PRs (Dependent Feature Chains)

Use stacked PRs when a feature depends on another feature that hasn't merged to main yet. Each PR in the stack targets the previous branch instead of main.

When to Use

Feature B requires infrastructure created by Feature A (e.g. a new subnet added in A is referenced in B)
Breaking a large change into reviewable, incremental PRs
Multiple developers working on sequential pieces of a story

How It Works

Create feat/base-network from main → open PR #1 targeting main
Create feat/add-apim-policy from feat/base-network → open PR #2 targeting feat/base-network
Each PR triggers pr-open.yml independently for lint and plan
Merge PR #1 first (bottom of the stack) — this triggers merge-main.yml
Retarget PR #2 to main and rebase onto updated main
Merge PR #2 — triggers another merge-main.yml run with its own semver tag

main ─────────────────────────┬───────────────────┬──────►
   \                          │ merge PR #1        │ merge PR #2
    └─ feat/base-network ─────┘   (v1.3.0)        │
         \                                         │
          └─ feat/add-apim-policy ─── rebase ──────┘
                                                 (v1.4.0)

Important: Always merge bottom-up. If you merge PR #2 before PR #1, the diff will include both sets of changes and the base branch won't exist in main yet. After merging the base PR, rebase the dependent branch onto main to pick up any squash-merge differences before merging.

CI Behaviour: Each PR in the stack runs pr-open.yml against test. The plan for PR #2 will show changes from both branches while PR #1 is open (because the diff includes the base). After PR #1 merges, re-running PR #2's checks shows only its own changes. Each merge to main produces its own semver tag.

Release PRs (Bundled Multi-Feature Releases)

Use a Release PR when multiple developers are working on separate features that should ship together as a single coordinated release. All feature branches merge into a shared release branch, which then opens one PR to main.

When to Use

A planned release includes work from multiple developers across different modules
Features need integration testing together before merging to main
You want one semver tag to represent the entire release bundle (e.g. a sprint deliverable)
Coordinating breaking changes that span multiple Terraform stacks

How It Works

Create a release branch from main: release/sprint-42 (or release/apim-v2, etc.)
Developers create feature branches from the release branch:
- feat/new-tenant-config → PR targeting release/sprint-42
- feat/apim-rate-limits → PR targeting release/sprint-42
- fix/dns-zone-ttl → PR targeting release/sprint-42
Feature PRs are reviewed and merged into the release branch (these merges do not trigger merge-main.yml since they don't target main)
Optionally deploy the release branch to dev via manual-dispatch.yml to validate the combined changes
When all features are complete, open one Release PR: release/sprint-42 → main
The Release PR triggers pr-open.yml — the plan shows the aggregate of all bundled changes
Merge the Release PR → merge-main.yml creates one semver tag and applies to test

main ─────────────────────────────────────────┬──────────►
   \                                           │ merge Release PR
    └─ release/sprint-42 ───┬──────┬──────────┘  (v2.0.0)
         \                  │      │
          ├─ feat/tenant ───┘      │
          │                        │
          └─ feat/rate-limits ─────┘

CI Behaviour: PRs targeting the release branch still trigger pr-open.yml (lint and plan), giving each feature its own review cycle. The final Release PR to main runs the full pipeline and shows a combined plan. Only the merge to main triggers merge-main.yml for semver tagging and test apply.

PR Title Convention: The Release PR title must follow Conventional Commits (e.g. feat: sprint 42 release — tenant config and rate limits) since it controls the semver bump. Use feat: for minor, fix: for patch, or include BREAKING CHANGE in the body for major.

Choosing Between Stacked PRs and Release PRs

	Stacked PRs	Release PR
Best for	Sequential/dependent changes by 1–2 developers	Parallel independent features by multiple developers
Semver tags	One tag per merged PR (e.g. v1.3.0, v1.4.0)	One tag for the entire bundle (e.g. v2.0.0)
Review granularity	Each PR is reviewed independently with a focused diff	Feature PRs reviewed individually; Release PR shows combined diff
merge-main.yml runs	Triggers once per PR merged to main	Triggers once for the entire release
Risk	Rebase conflicts after base merges	Release branch can drift from main if long-lived
Recommendation	Keep stacks shallow (2–3 deep max)	Keep release branches short-lived; rebase from main regularly

When Do I Need Self-Hosted Runners?

How this platform solves the private-endpoint problem: All continuous integration and deployment work in this repository runs on standard GitHub-hosted runners using ubuntu-24.04. When a workflow needs to reach a private endpoint, the Terraform deployer starts a Chisel tunnel and Privoxy inside Docker on that runner. That temporary proxy sends data-access traffic through a proxy service deployed in the tools virtual network. Because of that setup, this repository does not need self-hosted runners for its own pipelines.

The optional github_runners_aca module, deployed through add-or-remove-module.yml, can create self-hosted runners inside the virtual network. Use that option when other repositories or workloads need long-lived compute that already lives inside the private network, not for this repository's own automation.

The table below describes the general pattern. Data-plane work that can't use the Chisel tunnel approach — for example, other repos without the proxy setup — would still need self-hosted runners.

Understanding the Difference: See ADR-011: Control Plane vs Data Plane for the detailed explanation of why identity alone is enough for some operations, while private network access is also required for others.

Operation	Plane	Public Runner?	Example
Create resources (VMs, VNets, Key Vault)	Control	✓ Yes	`azurerm_key_vault`
Configure settings, RBAC roles	Control	✓ Yes	`azurerm_role_assignment`
Deploy private endpoints	Control	✓ Yes	`azurerm_private_endpoint`
Read/write Key Vault secrets	Data	✗ No	`azurerm_key_vault_secret`
Read/write Storage blobs	Data	✗ No	`azurerm_storage_blob`
Terraform state (if private)	Data	✗ No	Backend storage account

Public Runners Work For

Deploying all infrastructure modules
Network, Bastion, Jumpbox, Proxy
Any resource that doesn't read secrets
Documentation builds (pages.yml)

Self-Hosted Required For

Terraform using azurerm_key_vault_secret
Private state backend (blocked by PE)
Any code that reads secrets at plan time
Database migrations, blob uploads

Cost Tip: If your Terraform doesn't need data plane access, stick with public runners. Self-hosted runners on Container Apps add cost. Only enable github_runners_aca_enabled if you actually need data plane access in CI/CD.

.deployer.yml (Reusable Workflow)

Reusable Terraform workflow for initial-setup/infra, primarily used to manage the tools environment and targeted foundational modules.

Inputs

Input	Type	Description
`environment_name`	string	Target environment name (commonly `tools`)
`command`	string	Terraform command (init, plan, apply, destroy)
`target_module`	string	Required module target (for example: `azure_proxy`, `bastion`, `jumpbox`)

Key Features

OIDC Authentication: Uses azure/login@v2 with federated credentials
Environment Isolation: Each environment has separate secrets and identity
Module Targeting: Deploy specific modules with -target=module.name (supports all modules including azure-proxy)
Optional Modules: Deploy only what you need—all modules except network and monitoring can be toggled on/off
ARM_USE_OIDC: Enabled for Terraform Azure provider

Required Permissions

permissions:
  id-token: write   # Required for OIDC token generation
  contents: read    # Required for repository checkout

Important: Without id-token: write permission, the workflow cannot generate the OIDC token needed for Azure authentication.

.deployer-using-secure-tunnel.yml (Reusable Workflow)

Reusable Terraform workflow for infra-ai-hub. It receives encrypted proxy outputs from .deployer.yml, starts Chisel + Privoxy, then runs stack deployment and (for apply in dev/test) integration tests.

Highlights

Validated environments: dev, test, prod
Secure tunnel bootstrap: decrypts proxy_url/proxy_auth with GPG_PASSPHRASE
Proxy-aware Terraform: sets HTTP_PROXY, HTTPS_PROXY, and NO_PROXY
Integration tests (after successful apply in dev/test) — two phases:
- Direct (no proxy): uv run python ./run-tests.py --group direct — APIM and App Gateway are public endpoints; proxy not required
- AI evaluation (optional): uv run python ./run-evaluation.py runs only when the judge-model secrets and vars are configured
- Via proxy: uv run python ./run-tests.py --group proxy — Key Vault is private-endpoint only (public_network_access_enabled=false), so the key-rotation suite must route through the Chisel tunnel

.lint.yml (Reusable Workflow)

Reusable validation workflow used by PR checks. Runs Terraform formatting and linting, enforces PR title conventions, and blocks forks.

What It Runs

Pre-commit hooks: terraform fmt + tflint (diff-based when possible, full-repo fallback)
Conventional Commits: Validates PR title follows Conventional Commits spec (e.g. feat:, fix:, chore:)
Fork check: Rejects PRs from forked repositories
Summary on failure: When any check fails, prints a consolidated error summary with common reasons

.builds.yml (Container Build Workflow)

Reusable workflow for building and pushing container images to GitHub Container Registry (GHCR). Supports matrix builds for multiple packages including the azure-proxy services.

Built Packages

azure-proxy/chisel - Chisel-based secure tunnel server for private network access
azure-proxy/privoxy - Privoxy HTTP/SOCKS proxy server
jobs/apim-key-rotation - APIM subscription key rotation Container App Job

Image Tagging Strategy

Container images are tagged at different stages of the pipeline:

Stage	Tags Applied	Example
PR Build (`pr-open.yml`)	PR number, run number, `latest`	`42`, `42-7`, `latest`
Main Merge (`merge-main.yml`)	Semver tag added to `latest` image	`v1.2.3`
Prod Deploy (`manual-dispatch.yml`)	Deploys using the semver-tagged image	`v1.2.3` (immutable)

The latest tag is used for dev and test environments. Production always uses the semver tag to ensure the exact tested image is what runs in prod.

add-or-remove-module.yml

Manual workflow for deploying or destroying infrastructure modules on demand. Evolved from the earlier bastion-only workflow to support all optional modules (bastion, jumpbox, azure_proxy, github_runners_aca).

When to Use

Deploy Module: When you need to provision a specific infrastructure module
Destroy Module: When done, to save costs or clean up resources

How to Run

Go to Actions tab in GitHub
Select "Deploy or Remove Bastion Host"
Click "Run workflow"
Choose tools, module, and command

Workflow Inputs

Input	Options	Description
`environment_name`	tools	Target environment
`module`	bastion, jumpbox, azure_proxy, github_runners_aca	Module to deploy or destroy
`command`	apply, destroy	Terraform command to execute

Cost Optimization: Azure Bastion Standard SKU runs a minimum of 2 scale units (instances), so the minimum cost is always 2× the per-instance hourly rate (~$0.397/hour × 2 = ~$0.794/hour in Canada Central). Deploy only when needed and destroy when done.

pr-open.yml (Pull Request Checks)

Runs on pull request events (and manual dispatch) to provide fast CI signal before merge.

Automated Checks

Validation: Invokes .lint.yml (pre-commit fmt + tflint, conventional commits, fork check)
Container Build Validation: Invokes .builds.yml for proxy images
Tools Proxy Bootstrap: Invokes .deployer.yml to deploy azure_proxy in tools
Plan Against Test: Invokes .deployer-using-secure-tunnel.yml with command=plan
PR Description Update: Appends the test plan summary to the PR description under AI Hub Infra Changes
Final Result: Reports pass/fail per job with clear ✅/❌ indicators; PR-Description is non-blocking

Note: The pr-open workflow must pass before a PR can be merged. The plan step appends an AI Hub Infra Changes section to the PR description with a readable plan summary; when there are no infra diffs the section says No Changes to AI Hub Infra in this PR.

schedule.yml (Scheduled Cleanup)

Automatically destroys Bastion every day at 5 PM PST to prevent unnecessary charges.

Schedule

# Runs daily at 5 PM PST (1 AM UTC next day)
on:
  schedule:
    - cron: "0 1 * * *"

Workflow Logic

Check if Bastion exists: Uses Azure CLI to query the Bastion resource
Conditional destroy: Only runs Terraform destroy if Bastion is found
Status notification: Reports whether Bastion was destroyed or already removed

Jobs Flow

check-and-destroy-bastion
         │
         ├── bastion_exists=true ──► destroy-bastion ──► notification
         │
         └── bastion_exists=false ──────────────────────► notification

manual-dispatch.yml (Manual Promotion Workflow)

Use this workflow to run plan, apply, or destroy for dev, test, and prod.

How It Works

Validates inputs — prod apply requires a deploy_tag (a semver tag like v1.2.3 from a successful merge-main.yml run)
Verifies all container images exist with the semver tag (prod apply only)
Deploys/refreshes azure_proxy in tools via .deployer.yml
Passes encrypted proxy outputs to .deployer-using-secure-tunnel.yml
Executes selected Terraform command in chosen environment — prod uses the semver-tagged container images (e.g. :v1.2.3) while dev/test use :latest
For prod apply only: Creates a GitHub Release from the deployed tag with deployment metadata

PROD is gated: The prod environment has required reviewers configured as a GitHub Environment protection rule. Any workflow job targeting prod will pause and wait for manual approval from designated reviewers before executing. Additionally, prod apply requires you to specify which semver git tag to deploy — ensuring only test-validated commits reach production.

Dev Environment Scope: Developers may occasionally deploy to dev from a PR branch using workflow dispatch to test specific features. The dev environment does not include App Gateway or DNS Zone; end-to-end ingress and DNS validation is only supported in test and prod.

merge-main.yml (Auto Apply + Semantic Version on Main)

This workflow runs on every push to main. It creates a semantic version tag using Conventional Commits and then applies infrastructure to test.

Execution Flow

Concurrently on push to main:
- Semantic version: TriPSs/conventional-changelog-action inspects commits since last tag (feat: → minor, fix: → patch, BREAKING CHANGE → major), creates a git tag (e.g. v1.2.3), and pushes an updated CHANGELOG.md
- Proxy bootstrap: deploys/refreshes azure_proxy in tools via .deployer.yml
Tag container images: After the semver tag is created, all container images (azure-proxy/chisel, azure-proxy/privoxy, jobs/apim-key-rotation) that have a :latest tag are re-tagged with the semver version (e.g. :v1.2.3). This runs in parallel via a matrix strategy.
Once azure_proxy is live, its encrypted URL+auth outputs are passed to .deployer-using-secure-tunnel.yml
Run apply against test (through Chisel tunnel)
After successful apply, integration tests run automatically in two phases (direct + via proxy — see .deployer-using-secure-tunnel.yml section)

Why semver tags? Semantic version tags serve as the input for prod deployments. When promoting to prod via manual-dispatch.yml, you provide a version tag (e.g. v1.2.3) to ensure the exact commit that passed test is what gets applied to production. The version is derived from commit messages, so feat: and fix: prefixes directly control versioning.

Release Process (TEST → PROD)

Production releases follow a tag-based promotion flow:

Steps to Release to Production

Merge your PR to main — the test deployment and semantic versioning run automatically
Verify the test deployment succeeded and a version tag was created (visible in Actions summary and repo tags, e.g. v1.2.3)
Go to Actions → Deploy to Environments (Manual Dispatch)
Select: environment=prod, command=apply, deploy_tag=v1.2.3
Click Run workflow — the job will pause waiting for prod environment approval
A designated reviewer approves the deployment in the Actions UI
After successful apply, a GitHub Release is automatically created with deployment details

Release Contents

Each GitHub Release created for prod includes:

The git tag that was deployed
Commit SHA
Who triggered the deployment
Link to the workflow run
Timestamp

Concurrency Strategy

Several workflows and reusable jobs use concurrency to prevent race conditions and conflicting Terraform operations.

Workflow/Job	Concurrency Group	Why It Helps
`.deployer.yml`	`tools`	Serializes tools/bootstrap changes so multiple runs don't mutate shared proxy infra at the same time.
`.deployer-using-secure-tunnel.yml`	`${environment_name}`	Ensures only one Terraform operation per environment runs at once (for example, only one test apply).
`pr-open.yml` builds job	`builds-${PR number}` (cancel in-progress)	Stops stale image builds when a newer commit arrives in the same PR.
`manual-dispatch.yml`	`manual-deploy-${run_id}`	Keeps each manually triggered deployment isolated and traceable to a single run.
`merge-main.yml`	`deploy-test-on-main`	Queues merges to main so test applies execute in order and avoid state lock contention.

Practical Effect: Concurrency improves deployment reliability by reducing Terraform state lock failures, avoiding overlapping applies, and ensuring each environment converges predictably.

APIM Key Rotation

APIM subscription key rotation is now handled by a Container App Job (scheduled) deployed as a custom container. The job source is at jobs/apim-key-rotation/ and the Terraform module at infra-ai-hub/modules/key-rotation-function/.

.builds.yml (Matrix Entry)

The key rotation container image is built by the shared .builds.yml reusable workflow as a matrix entry alongside other containers (chisel, privoxy).

Called by pr-open.yml on PR open/update
Triggers on changes to jobs/apim-key-rotation/ or the workflow itself
Uses bcgov/action-builder-ghcr for image build and push
Tagged with PR number, run number, and latest

Semver Image Tagging

On merge to main, merge-main.yml re-tags the :latest image with the semver version (e.g. :v1.2.3). This ensures prod deployments use an immutable, version-pinned image rather than the mutable :latest tag.

Dev/test use :latest (with a FORCE_IMAGE_PULL env var to trigger re-pull each Terraform apply)
Prod uses the semver tag (e.g. :v1.2.3) — immutable and traceable
Terraform container_image_tag_job_key_rotation variable controls which tag is deployed

pages.yml (Documentation)

Deploys this documentation site to GitHub Pages when changes are pushed to docs or Terraform roots (so generated references stay current).

Triggers

Push to main: When files in docs/**, initial-setup/infra/**, or infra-ai-hub/** change
Manual: Can be triggered manually via workflow_dispatch

Deployment Steps

Checkout repository
Run docs/generate-tf-docs.sh to refresh Terraform reference content
Run docs/build.sh to generate HTML from templates
Configure GitHub Pages
Upload docs/ folder as artifact
Deploy to GitHub Pages

Environment Secrets

Each GitHub environment requires these secrets (created by initial-azure-setup.sh):

Secret	Description	Source
`AZURE_CLIENT_ID`	Managed Identity client ID	Created by setup script
`AZURE_TENANT_ID`	Azure AD tenant ID	Your Azure subscription
`AZURE_SUBSCRIPTION_ID`	Target subscription ID	Your Azure subscription
`VNET_RESOURCE_GROUP_NAME`	Resource group containing VNet	Your infrastructure
`VNET_NAME`	Existing VNet name	Your infrastructure
`VNET_ADDRESS_SPACE`	VNet CIDR block	Your infrastructure
`SOURCE_VNET_ADDRESS_SPACE`	Source VNet(tools) CIDR for NSG rules	Your infrastructure
`SUBNET_ALLOCATION`	JSON object for `subnet_allocation` (`map(map(string))`)	Azure Blob (network-info/subnet-allocation) then copied to GitHub secret
`EXTERNAL_PEERED_PROJECTS`	Optional JSON object for `external_peered_projects` (`map(object)`)	Azure Blob (network-info/subnet-allocation) then copied to GitHub secret

Subnet Allocation Process (Blob First)

For visibility and team collaboration, keep network JSON in Azure Blob first, then copy the same JSON into GitHub environment secrets.

Update the environment file in Blob storage (for example, subnet-allocation-dev.json or subnet-allocation-test.json in network-info/subnet-allocation).
Copy the JSON payload and paste it into the matching GitHub environment secret SUBNET_ALLOCATION.
If direct APIM access from peered VNets is needed, copy the external_peered_projects JSON payload into optional secret EXTERNAL_PEERED_PROJECTS.
Run plan in that environment to validate before apply.

Format rule: Paste raw JSON in GitHub secrets (single line is safest). Do not use HCL syntax (for example, subnet_allocation = { ... } or external_peered_projects = { ... }) and do not paste escaped JSON with backslashes.

How JSON Flows into Terraform

GitHub Actions reads secrets.SUBNET_ALLOCATION.
The reusable workflow maps it to TF_VAR_subnet_allocation.
Terraform parses the JSON into var.subnet_allocation (map(map(string))).
If secrets.EXTERNAL_PEERED_PROJECTS is present, the workflow exports it as TF_VAR_external_peered_projects and Terraform parses it into var.external_peered_projects.

Common Errors You Might See

Missing secret: Error: Required environment secret SUBNET_ALLOCATION is missing or empty.
Invalid JSON: Error: SUBNET_ALLOCATION must be valid JSON and a non-empty JSON object.
Invalid optional peered-project JSON: workflow JSON parse step fails when EXTERNAL_PEERED_PROJECTS is not a valid JSON object.
Type/shape mismatch in Terraform: validation errors from variable "subnet_allocation" (invalid CIDR, duplicate subnet names, missing PE subnet, unsupported subnet key).
Peered-project validation errors: Terraform errors for external_peered_projects (invalid CIDR, priority outside 400–499, duplicate priority, invalid project name format).
Wrong precedence assumptions: values in -var-file override TF_VAR_*; keep subnet_allocation out of committed shared.tfvars.

Running Workflows Locally

For local development and testing, use the deployment scripts in each Terraform root:

Mandatory variable: Local runs must export TF_VAR_subnet_allocation before plan/apply. Source of truth is the tools storage account tftoolsaihubtracking, container tools, path network-info/subnet-allocation/.

# Load required subnet allocation JSON (example: prod)
az account set --subscription "da4cf6-tools - AI Services Hub"
tmpfile=$(mktemp)
az storage blob download \
    --account-name tftoolsaihubtracking \
    --container-name tools \
    --name network-info/subnet-allocation/subnet-allocation-prod.json \
    --auth-mode login \
    --file "$tmpfile"
export TF_VAR_subnet_allocation="$(jq -c . "$tmpfile")"
rm -f "$tmpfile"

# Switch back to your deployment subscription before running terraform
az account set --subscription "da4cf6-dev - AI Services Hub"

# Ensure you're logged in
az login

# Initial setup / tools
./initial-setup/infra/deploy-terraform.sh init
./initial-setup/infra/deploy-terraform.sh plan
./initial-setup/infra/deploy-terraform.sh apply

# AI Hub stacks
./infra-ai-hub/scripts/deploy-terraform.sh plan dev
./infra-ai-hub/scripts/deploy-terraform.sh apply test

Note: Local runs use your Azure CLI credentials instead of OIDC. Make sure you have the required permissions and that TF_VAR_subnet_allocation is exported in the shell.