AI Services Hub
Azure Landing Zone Infrastructure

Technical Deep Dive: Management Access, Data Access, and the Chisel Tunnel

This document explains one of the most important ideas in the whole platform: the difference between managing an Azure resource and accessing the private data inside that resource. It then shows why some Terraform operations fail from a public automation runner and how the Chisel tunnel solves that problem.

Table of Contents

  1. Control Plane vs Data Plane: The Fundamental Divide
  2. Key Vault Operations: What Requires Data Plane Access
  3. Terraform + Key Vault: When Operations Fail
  4. Chisel: How the Tunnel Works
  5. Deployment Scenarios: Who Needs What

1. Control Plane vs Data Plane: The Fundamental Divide

What is the Control Plane?

The control plane is Azure's management layer. It is the part of the platform that lets you create, update, delete, and describe resources:

Endpoint: management.azure.com

Authentication: OpenID Connect, often shortened to OIDC, works well here because these calls go to Azure's public management endpoint.

What is the Data Plane?

The data plane is where real data access happens. This is the layer you touch when you read, write, upload, download, or query actual tenant or platform data:

Endpoint: Service-specific (e.g., *.vault.azure.net, *.blob.core.windows.net)

Authentication: Authentication can still succeed, but the request often fails when private endpoints are enabled because the caller may not have a network path to the service.

Why the Divide?

Azure separates these two layers for security and architectural reasons:

Critical Insight: When you enable private endpoints, you are closing off public routes to the data-access side of a service. A tool can still obtain a valid identity token and yet fail because it has no network route to the private endpoint.

2. Key Vault Operations: What Requires Data Plane Access

Key Vault Endpoints

Operations That Require Data Plane Access

Reading Secrets

// This REQUIRES data plane access
data "azurerm_key_vault_secret" "example" {
  name     = "my-secret"
  vault_id = "/subscriptions/.../keyvaults/myvault"
}

// When private endpoints are enabled:
// ✗ FAILS: Cannot reach myvault.vault.azure.net from GitHub runner
// ✓ WORKS: Can reach myvault.vault.azure.net from within VNet or via Chisel

Writing Secrets

// This REQUIRES data plane access
resource "azurerm_key_vault_secret" "example" {
  name         = "my-secret"
  value        = "secret-value"
  key_vault_id = azurerm_key_vault.main.id

  // Terraform must connect to: myvault.vault.azure.net
  // When private endpoints are enabled:
  // ✗ FAILS: GitHub runner cannot reach myvault.vault.azure.net
  // ✓ WORKS: Chisel tunnel provides access to myvault.vault.azure.net
}

Why These Fail Even When OpenID Connect Works

GitHub Runner (OIDC Authentication)
        ↓
Azure AD (Token: aud=management.azure.com)
        ↓
Azure ARM API (management.azure.com) ✓ WORKS
        ↓
Terraform creates resource
        ↓
Terraform tries to write secret
        ↓
needs to access: myvault.vault.azure.net ✗ BLOCKED
                ↑
        Private endpoint blocks public access
        GitHub runner has no route to VNet

3. Terraform + Key Vault: When Operations Fail

The following examples show how Terraform interacts with Key Vault. In this repository, the real implementation is spread across infra-ai-hub/stacks/ rather than one single main.tf file, but the same networking rule still applies.

Creating Key Vault (Control Plane - Works)

resource "azurerm_key_vault" "main" {
  name                = var.key_vault_name
  location            = var.location
  resource_group_name = azurerm_resource_group.main.name
}

// ✓ WORKS with OIDC
// Uses: management.azure.com
// No private endpoint yet, so no blocking

Creating Private Endpoint (Control Plane - Works)

resource "azurerm_private_endpoint" "key_vault_pe" {
  name                = "${var.app_name}-kv-pe"
  location            = var.location
  resource_group_name = azurerm_resource_group.main.name
  subnet_id           = module.network.private_endpoint_subnet_id

  private_service_connection {
    name                           = "${var.app_name}-kv-psc"
    private_connection_resource_id = azurerm_key_vault.main.id
    is_manual_connection           = false
    subresource_names              = ["vault"]
  }
}

// ✓ WORKS with OIDC
// Uses: management.azure.com
// This BLOCKS public access to *.vault.azure.net

Waiting for DNS (DNS Propagation Workaround)

resource "null_resource" "wait_for_key_vault_private_dns" {
  triggers = {
    private_endpoint_id = azurerm_private_endpoint.key_vault_pe.id
    // ... other values
  }

  provisioner "local-exec" {
    command = <<EOT
      # Wait for DNS to be ready before trying data plane operations
      # Azure creates private DNS records asynchronously
      # We poll until the private endpoint can resolve its DNS name
    EOT
  }

  depends_on = [azurerm_private_endpoint.key_vault_pe]
}

// Why necessary?
// After private endpoint creation, Azure asynchronously:
// 1. Creates private DNS zone: privatelink.vaultcore.azure.net
// 2. Links it to the VNet
// 3. Creates A records for the private endpoint
// 4. This can take 30-120 seconds
//
// If we try to create secrets immediately:
// ✗ FAILS: Cannot resolve myvault.vault.azure.net
// ✓ WORKS: After DNS is ready, myvault.vault.azure.net resolves to private IP

Creating Secrets (Data Plane - FAILS without Chisel)

resource "azurerm_key_vault_secret" "secret_one" {
  name            = "example-secret-test-one"
  value           = random_password.secret_one.result
  key_vault_id    = azurerm_key_vault.main.id
  expiration_date = "2025-12-31T23:59:59Z"

  depends_on = [null_resource.wait_for_key_vault_private_dns]
}

resource "azurerm_key_vault_secret" "secret_two" {
  name            = "example-secret-test-two"
  value           = random_password.secret_two.result
  key_vault_id    = azurerm_key_vault.main.id
  expiration_date = "2025-12-31T23:59:59Z"

  depends_on = [null_resource.wait_for_key_vault_private_dns]
}

// ✗ FAILS with OIDC + Private Endpoints
// Terraform tries to connect to: myvault.vault.azure.net
// GitHub runner has no route to the private IP
//
// ✓ WORKS with:
//   1. Self-hosted runner inside VNet
//   2. Bastion + Jumpbox (VPN to VNet)
//   3. Chisel tunnel (App Service → VNet → *.vault.azure.net)

4. Chisel: How the Tunnel Works

The Problem Chisel Solves

BEFORE Chisel:
┌──────────────┐         ┌──────────────┐
│ GitHub Runner │ ──OIDC──> │ Azure ARM API │
│ (Public)      │          │ mgmt.azure.com│
└──────────────┘          └──────┬───────┘
                                │
                                │ Creates resources
                                ↓
┌──────────────┐         ┌──────▼───────┐
│ GitHub Runner │         │ Private      │
│ ✗ No route   │         │ Endpoint     │
│ to VNet      │         │ *.vault.azure.net │
└──────────────┘         │ BLOCKED      │
                          └──────────────┘
AFTER Chisel:
┌──────────────┐         ┌──────────────┐
│ GitHub Runner │ ──OIDC──> │ Azure ARM API │
│ (Public)      │          │ mgmt.azure.com│
└──────┬───────┘          └──────┬───────┘
       │                          │
       │ Creates                  │
       ↓                          │
┌──────▼──────────┐         ┌─────▼──────┐
│ App Service     │         │ Private    │
│ (Chisel Server) │────────> │ Endpoint   │
│ VNet Integrated │         │ *.vault.azure.net │
└──────┬──────────┘         │ ACCESSIBLE │
       │                          │
       │ Chisel tunnel            │
       ↓ (HTTPS/WebSocket)        │
┌──────▼──────────┐              │
│ Local Machine   │ ←────────────┘
│ (Chisel Client) │
└─────────────────┘

How Chisel Works (Technical Deep Dive)

Step 1: App Service Deployment (Control Plane - Works with OIDC)

From initial-setup/infra/modules/azure-proxy/main.tf:

resource "azurerm_linux_web_app" "azure_proxy" {
  name                      = "${var.app_name}-${var.app_env}-azure-proxy-${random_string.proxy_dns_suffix.result}"
  resource_group_name       = var.resource_group_name
  location                  = var.location
  service_plan_id           = azurerm_service_plan.azure_proxy_asp.id

  // ✓ KEY: VNet Integration
  virtual_network_subnet_id = var.app_service_subnet_id

  // ✓ KEY: Managed Identity for Docker pulls
  identity {
    type = "SystemAssigned"
  }

  site_config {
    application_stack {
      docker_image_name   = var.azure_proxy_image
      docker_registry_url = var.container_registry_url
    }
  }

  app_settings = {
    // ✓ KEY: Chisel authentication
    CHISEL_AUTH = "${random_uuid.proxy_chisel_username.result}:${random_password.proxy_chisel_password.result}"
  }
}

// ✓ This WORKS with OIDC because:
// 1. Creates App Service (management.azure.com) ✓
// 2. Pulls Docker image (docker.io) ✓
// 3. Deploys to App Service (management.azure.com) ✓
// 4. Integrates with VNet (management.azure.com) ✓

Step 2: Chisel Server Runs in App Service

From azure-proxy/chisel/start-chisel.sh:

#!/bin/sh
set -e

# Chisel runs as a server inside the App Service
# Listens on port 80 (configured by PORT env var)

chisel server \
  --port $PORT \
  --auth $CHISEL_AUTH \
  --socks5 \
  --reverse

# What this does:
# --port 80: Listen on HTTP (App Service handles HTTPS termination)
# --auth: Require authentication (prevents unauthorized access)
# --socks5: Enable SOCKS5 proxy (for general traffic)
# --reverse: Enable reverse tunneling (clients can expose local ports)

Step 3: Local Machine Connects

From azure-proxy/chisel/README.md:

# Developer runs this on their laptop:

docker run --rm -it -p 5462:5432 jpillora/chisel:latest client \
  --auth "tunnel:XXXXXXX" \
  https://${azure-proxy-app-service-url} \
  0.0.0.0:5432:${postgres_hostname}$:5432

# What happens:
# 1. Chisel client on laptop connects to Chisel server in App Service
# 2. Connection goes: laptop → Internet → App Service (public)
# 3. App Service is VNet-integrated
# 4. Chisel forwards: laptop:5432 → App Service → VNet → postgres:5432

Step 4: Traffic Flow with Private Endpoints

┌──────────────┐
│ Laptop       │
│ Chisel Client│
│ Port 5432    │
└──────┬───────┘
       │
       │ 1. Connect to App Service
       │ HTTPS / WebSocket
       ↓
┌──────▼──────────┐
│ App Service     │
│ (Chisel Server) │
│ Public endpoint │
│ VNet integrated │
└──────┬──────────┘
       │
       │ 2. Inside VNet
       │ Private IPs
       ↓
┌──────▼──────────────────┐
│ Private Endpoint        │
│ Subnet                  │
│ Routes to:              │
│ myvault.vault.azure.net │
└──────┬──────────────────┘
       │
       │ 3. Data plane access
       │ *.vault.azure.net
       ↓
┌──────▼──────────┐
│ Key Vault       │
│ Data operations │
│ - Get secrets   │
│ - Set secrets   │
│ - List secrets  │
└────────────────┘

✓ SUCCESS: Terraform can read/write secrets!

Why Chisel vs Alternatives?

Method Cost Setup Use Case Limitations
Bastion + Jumpbox $200-300/mo High Admin access, full VM Expensive, overkill for single developer
Self-hosted Runner $100/mo Medium CI/CD with secrets Always running, needs management
Chisel + App Service $15/mo Low Local dev, on-demand access Manual connection needed

5. Deployment Scenarios: Who Needs What

Scenario 1: Platform Team (Deploys Landing Zone)

Who: Platform Services Team, Infrastructure Admins

What they deploy:

When they need data plane access:

Recommended access method:

Scenario 2: Project Teams (Deploy Applications)

Who: Ministry Application Teams

What they deploy:

What they DON'T deploy:

When they need data plane access:

Recommended access method:

Scenario 3: Solo Developer

Who: Single developer, proof-of-concept work

What they deploy:

Recommended setup:

Skip:

6. Portal → Hub Key Vault Integration

The Tenant Onboarding Portal reads APIM subscription keys from the hub Key Vault at request time, rather than storing them in its own data store. This section describes the end-to-end flow and access model.

Access Flow

  1. An authenticated tenant-admin user requests credentials in the portal UI (env tab selected).
  2. The portal backend (AppController.getCredentials) verifies the session and confirms the user is in the tenant's admin_users list.
  3. HubKeyVaultService selects the SecretClient for the requested environment and calls getSecret for {tenant}-apim-primary-key, {tenant}-apim-secondary-key, and {tenant}-apim-rotation-metadata.
  4. The portal's system-assigned Managed Identity authenticates to the hub Key Vault via DefaultAzureCredential. No connection strings or stored credentials are used.
  5. The response is returned over the authenticated session — keys are immediately written to the clipboard client-side and never stored in React state or rendered as DOM text.

RBAC Model

The portal MI is granted Key Vault Secrets User on each hub Key Vault (one azurerm_role_assignment per environment in tenant-onboarding-portal/infra/main.tf, gated on var.hub_keyvault_id_{env} != ""). This is a read-only RBAC role — the portal can retrieve secret values but cannot create, update, or delete secrets.

APIM Tenant Info

The GET /tenants/:name/tenant-info endpoint proxies the APIM /{tenant}/internal/tenant-info policy (which already existed — no APIM changes were required). The portal uses the per-env apimGatewayUrl setting and the tenant's primary key to authenticate the internal call.

Configuration

Six environment variables wire the hub Key Vault URIs and APIM gateway URLs into the portal backend:

These are populated by the CI/CD pipeline from hub Terraform remote state (via the collect-hub-outputs matrix job — see ADR on hub output collection).

Summary

Why We Need Key Vault Access

  1. Security: Private endpoints prevent public access to secrets
  2. Compliance: BC Gov requires zero-trust, no public data plane access
  3. Operations: Applications need to read secrets at runtime
  4. Automation: Terraform needs to write secrets during deployment

Why OIDC Alone Isn't Enough

  1. OIDC provides access to management.azure.com (control plane)
  2. Private endpoints block *.vault.azure.net (data plane)
  3. GitHub runners have no route to VNet private IPs
  4. Even with valid tokens, network access is required

How Chisel Solves This

  1. App Service runs in VNet (via VNet integration)
  2. Chisel server in App Service bridges public → private
  3. Developer connects laptop to App Service via HTTPS
  4. Traffic flows: laptop → App Service → VNet → private endpoints
  5. Result: Data plane access without exposing services publicly

When to Use Each Method