AI Services Hub
Azure Landing Zone Infrastructure

Cost Tracking

Work in Progress
This page is still being refined. The cost tracking design and chargeback process are not fully finalised yet. See the cost allocation decision record for the current direction.

This page explains how platform costs are tracked, how shared costs are split across teams, and which parts of the design are still being worked out. It is written for readers who want to understand the operating model before they dive into detailed formulas and example queries.

Architecture

Project Resource Architecture

What Each Project Gets

Resource Type Description
Resource Group Dedicated rg-{project}-{env} - contains all project resources
Storage Account Dedicated Project documents and data
Search Service Dedicated Project indexes for RAG
Key Vault Dedicated Project secrets and keys
AI Foundry Project Dedicated Project's own API endpoint, prompt flows, index (within shared Hub)
APIM Subscription Dedicated Project's API key for access
AI Models (GPT-4, etc.) Shared Deployed once in Hub, accessed via AI Foundry Project
APIM, App Gateway, Firewall Shared Entry point infrastructure
Azure Proxy (Chisel Server) Shared App Service running Chisel SOCKS5 proxy for CI/CD private-endpoint access (always-on)
CI/CD Runners (GitHub self-hosted) Optional Optional Container Apps job + ACR + Log Analytics — provisioned only when github_runners_aca_enabled = true; not used by this platform’s own CI/CD

Terraform Deploys Everything

Tenant stacks are deployed with deploy-scaled.sh which runs Terraform across five isolated state files. Resource AVM modules are used where available; raw Terraform resources are used when an AVM module does not exist.

# deploy-scaled.sh executes 3 phases:
# Phase 1: shared foundation (network, KV, AppGW, WAF)
# Phase 2: per-tenant stacks (parallel, isolated state)
# Phase 3: foundry + apim + tenant-user-mgmt
#
# Each stack corresponds to an infra-ai-hub/stacks/* directory.

AVM Modules Used

Resource AVM Module
Storage Account avm-res-storage-storageaccount
Search Service avm-res-search-searchservice
Key Vault avm-res-keyvault-vault
AI Foundry Hub avm-res-machinelearningservices-workspace
API Management avm-res-apimanagement-service
Application Gateway avm-res-network-applicationgateway
Virtual Network avm-res-network-virtualnetwork

How Cost Tracking Works

Direct Costs

Storage, Search, Key Vault

Tracked automatically by Azure Cost Management using Resource Group tags.

~40% of total

AI Usage

Tokens, API calls

Tracked by AI Foundry per project. Each AI Foundry Project has its own metrics.

~45% of total

Platform Split

APIM, Gateway, Firewall

Split evenly across all projects (shared infrastructure).

~15% of total

CI/CD Runner Costs (Self-hosted GitHub runners)

The optional github_runners_aca module provisions self-hosted runners inside the VNet for tenant CI/CD pipelines that need private endpoint access. Runners run as Azure Container Apps jobs (scale-to-zero) with supporting services (ACR + Log Analytics). Note: this repo's own CI/CD uses public GitHub runners + the Chisel tunnel instead.

Cost Drivers

  • Runner compute: Container Apps job runtime (primary driver; usage-based)
  • Concurrency cap: max_runners (default 4)
  • Sizing: container_cpu (default 1 vCPU) and container_memory (default 2Gi)
  • Logs: Log Analytics ingestion (GB/month)
  • Images: ACR Premium base cost (fixed) + storage/transfer (Premium is required for Private Link / private endpoints)

Sample Calculation (module defaults)

Assumptions: 20 jobs/day, 10 min/job, avg concurrency = 1.5

Monthly runner hours = 30 * 20 * (10/60) * 1.5 = 150 hours
    Total seconds = 150 * 3600 = 540,000 seconds

    Canada Central rates (Container Apps):
    vCPU-seconds: $0.0000480/sec
    GiB-seconds:  $0.0000057/sec
    Requests:     $0.565 per million

    vCPU cost = 540,000 * 1 vCPU * 0.0000480 = $25.92
    Memory cost = 540,000 * 2 GiB * 0.0000057 = $6.16
    Request cost (example 1M req/mo) = $0.57

    Estimated runner compute total ≈ $32.65/mo

    ACR Premium base cost = $2.351/day ≈ $70.53/mo (30-day month)
    Estimated compute + ACR base ≈ $103.18/mo

Assumes a single-region ACR Premium registry with no geo-replication and no connected registry. Add Log Analytics ingestion (GB/month) and any ACR transfer/storage beyond the included 500 GB for a full estimate.

Example Monthly Costs

Health-RAG Project

Storage + Search + KV $420 DIRECT
AI usage (60% of tokens) $1,020 AI
Platform share (50%) $300 SPLIT
Total $1,740

SDPR-Chatbot Project

Storage + Search + KV $350 DIRECT
AI usage (40% of tokens) $680 AI
Platform share (50%) $300 SPLIT
Total $1,330

Usage Monitoring, Cost Allocation, and Chargeback Metrics

Introduction

This document defines the approach for implementing Usage Monitoring, Cost Allocation, and Chargeback Metrics for the BC Government AI Services Hub multi-tenant platform. These three interconnected capabilities are essential for operating a shared AI infrastructure that serves multiple ministries while maintaining cost transparency and accountability.

Key Concepts

Usage Monitoring tracks resource consumption at the tenant level to support both cost allocation and operational insights. For the AI Services Hub, monitoring serves two purposes: capturing metrics for shared infrastructure allocation (APIM, App Gateway) and providing operational visibility into service usage patterns.

Cost Allocation combines Azure's native cost tracking with custom calculations for shared resources. The hub architecture uses two allocation models: direct attribution for tenant-dedicated resources (AI Foundry projects with their own Azure OpenAI, Cosmos DB, and AI Search; dedicated Document Intelligence instances), and proportional allocation for shared infrastructure (APIM, App Gateway, monitoring services) where costs are split based on actual usage percentages.

Chargeback Metrics aggregate all cost components—direct resource costs from Azure billing and allocated shared infrastructure costs—into consolidated monthly invoices per tenant.

Document Scope

This document covers:

  1. Tagging strategies for both dedicated and shared resources across the dual-region deployment (Canada Central/East)
  2. Usage tracking pipelines using APIM, Event Hubs, and Azure Functions to capture metrics for shared infrastructure allocation
  3. Cost calculation methods combining Azure Cost Management (for direct attribution) with custom allocation logic (for shared infrastructure)
  4. Implementation patterns with code examples for usage tracking, cost allocation functions, and Kusto queries for chargeback reporting
  5. Network egress tracking to handle cross-region costs between Canada Central (APIM/App Gateway) and Canada East (AI Foundry)

The approach leverages APIM as the central governance point where all AI requests flow—regardless of whether backend services are inside the Foundry landing zone (OpenAI, AI Search) or outside it (Document Intelligence). APIM routing policies direct each tenant to their appropriate backend resources, enabling consistent tenant identification for shared infrastructure cost allocation.


Usage Monitoring Metrics

Azure API Management (APIM) provides centralized monitoring as all AI requests flow through the gateway. Monitoring serves two distinct purposes: tracking metrics for shared infrastructure allocation and providing operational insights.

Metrics for Shared Infrastructure Allocation

These metrics are used to proportionally split shared infrastructure costs (APIM, App Gateway, networking):

Operational Metrics (Not for Chargeback)

These metrics support capacity planning, SLA monitoring, and performance optimization:

Monitoring Architecture

  1. APIM logs requests/responses with tenant-id to Azure Event Hubs
  2. Azure Functions process Event Hub messages to:
    • Count API calls per tenant (for APIM/Gateway allocation)
    • Sum network egress bytes per tenant (for data transfer allocation)
    • Extract operational metrics (tokens, pages, latency)
  3. Results stored in Log Analytics:
    • Allocation metrics used in monthly shared cost calculations
    • Operational metrics used for dashboards and SLA reporting

Resource Tagging and Cost Allocation

AI Foundry Projects (Direct Attribution)

Each tenant receives a dedicated Foundry project with isolated resources. Azure automatically bills all consumption (Azure OpenAI tokens, Cosmos DB, AI Search) to the project.

Tagging strategy:

Project: "tenant-wlrs-water-permits"
Tags:
- tenant-id: "wlrs"
- cost-center: "CC-NRM-WLRS"
- department: "Natural-Resources"
- environment: "production"
- service-tier: "standard"

Cost allocation: Direct attribution via Azure Cost Management tag filtering. Query Tags["tenant-id"] == "wlrs" shows all Foundry project costs including:

No manual calculation needed—Azure bills these resources directly to each project.

Document Intelligence (Direct Attribution)

Architectural Decision: Deploy one dedicated Document Intelligence resource per tenant

Rationale:

  1. Simplified cost allocation: Direct attribution via tags, no proportional calculation needed
  2. Performance isolation: No "noisy neighbor" concerns with dedicated resources
  3. Compliance: Easier to meet ministry-specific data residency requirements
  4. Scaling: Each tenant can independently scale their DI instance

Resources:

Tags (on each DI resource):

Tags:
- tenant-id: "wlrs" OR "sdpr"
- shared-service: "no"
- resource-type: "document-intelligence"
- managed-by: "ai-services-hub"

Cost allocation: Direct attribution via Azure Cost Management tag filtering. Azure Cost Management query: Tags["tenant-id"] == "wlrs" shows WLRS's DI costs (page processing charges) directly.

Usage tracking (for operational metrics only):

Implementation:


Infrastructure and Platform Costs (Proportional Allocation)

These shared resources serve all tenants and require proportional cost allocation based on usage metrics.

App Gateway/WAF

Tags:

Tags:
- shared-service: "yes"
- resource-type: "app-gateway-waf-v2"
- allocation-method: "request-count-proportional"

Cost structure:

Allocation method: Proportional based on request count from App Gateway access logs

APIM V2

Tags:

Tags:
- shared-service: "yes"
- allocation-method: "api-call-proportional"

Cost structure: $1,000-2,000/month depending on tier

Allocation method: Based on API call volume per tenant from Event Hubs

AI Foundry Hub Dependencies

Storage Account (Foundry hub-level):

Application Insights/Log Analytics:

Key Vault:

Network Egress

Cost structure:

Risk in dual-region setup: Cross-region traffic between Canada East (Foundry) and Canada Central (APIM) incurs egress charges

Tags:

Tags:
- traffic-source: "canada-east-foundry"
- traffic-destination: "canada-central-apim"
- allocation-method: "tenant-response-bytes"

Tracking mechanism: App Gateway diagnostic logs (not Azure Cost Management tags)

Note: Egress is billed at subscription level, requires custom calculation from logs (see implementation section below)

Regional Cost Tracking

The dual-region deployment (Canada Central for APIM/App Gateway, Canada East for Foundry) requires regional cost tracking.

All resources must include:

Tags:
- deployment-region: "canada-central" OR "canada-east"
- primary-region: "canada-central"

Why this matters:

  1. Pricing variations: Some Azure services have different pricing between Canada Central and Canada East
  2. Cross-region attribution: When WLRS uses Foundry in Canada East but APIM in Canada Central, costs must be attributed correctly
  3. Egress tracking: Cross-region traffic needs regional source/destination tags

Chargeback Metrics Summary

Monthly tenant invoices combine two cost categories:

Direct Costs (Azure-Billed, No Calculation Needed)

Retrieved via Azure Cost Management tag filtering (tenant-id):

Allocated Costs (Calculated from Usage Metrics)

Proportionally split based on tenant usage:


Implementation: Cost Calculation Methods

Method 1: Azure Cost Management (Built-in, No Code)

Azure Cost Management handles all direct attribution automatically. No custom code needed for tenant-dedicated resources.

Step 1: Enable Tag Inheritance
Azure Portal → Cost Management → Settings → Configuration
→ Enable "Automatically apply subscription and resource group tags to new data"

This propagates tags from subscriptions/resource groups down to individual usage records.

Step 2: View Direct Attribution Costs
Cost Analysis → Add Filter → Tag
→ Select "tenant-id" → Choose "wlrs"

This shows all costs where tenant-id = wlrs, including:

This is direct attribution—Azure calculates it automatically based on actual consumption.

Step 3: Create Cost Allocation Rules for Shared Resources

This is where shared infrastructure costs (APIM, App Gateway) get split:

Cost Management → Cost Allocation Rules → Add Rule

SOURCE (what to split):
- Resource Group: "rg-ai-hub-shared-infra"
- Tag filter: shared-service = "yes"

TARGETS (who receives the split):
- Tag: tenant-id = "wlrs"
- Tag: tenant-id = "sdpr"
- Tag: tenant-id = "other-ministry"

ALLOCATION METHOD:
Option A: "Distribute evenly" → Each tenant gets 33.33%
Option B: "Total cost proportional" → Split based on each tenant's existing costs
Option C: "Custom percentage" → Manually set: WLRS 45%, SDPR 30%, Others 25%

Limitation: The "proportional" options only work based on existing Azure costs, not custom metrics like "API call count". For usage-based allocation, use Method 2 below.

Result: Monthly report showing allocated costs per tenant:


Method 2: Custom Calculation (Usage-Based Allocation)

For proportional allocation based on usage metrics (API calls, egress bytes), custom code is required.

Architecture
Event Hubs (usage data)
→ Azure Function (calculate percentages)
→ Log Analytics (store allocation data)
→ Power BI/Cost Dashboard (reporting)
Step 1: Calculate Tenant Usage Percentages

Azure Function runs monthly (triggered by timer):

# Pseudo-code for monthly allocation calculation
import kusto_client

# Query Event Hub processed data for API call counts
query = """
customEvents
| where timestamp >= startofmonth(now()) and timestamp < startofmonth(now(), 1)
| where name == "APIM-Request-Log"
| summarize RequestCount = count() by TenantId
"""

results = kusto_client.execute(query)
# Results: {"wlrs": 45000, "sdpr": 30000, "others": 25000}

total_requests = sum(results.values())  # 100,000

# Calculate percentages for shared infrastructure allocation
allocations = {
tenant: (count / total_requests) * 100
for tenant, count in results.items()
}
# Result: {"wlrs": 45%, "sdpr": 30%, "others": 25%}
Step 2: Query Azure Cost Management API for Shared Resource Costs
from azure.mgmt.costmanagement import CostManagementClient

# Get actual costs for shared resources
cost_query = {
"type": "ActualCost",
"timeframe": "MonthToDate",
"filter": {
    "tags": {
        "name": "shared-service",
        "operator": "In",
        "values": ["yes"]
    }
}
}

shared_costs = cost_client.query(scope, cost_query)
# Result: APIM = $1,500, App Gateway = $400, Total = $1,900
Step 3: Apply Allocation Percentages
# Calculate each tenant's share of shared infrastructure
apim_cost = 1500
app_gateway_cost = 400

tenant_allocations = {
"wlrs": {
    "apim": apim_cost * 0.45,           # $675
    "gateway": app_gateway_cost * 0.45, # $180
    "total_shared": 855
},
"sdpr": {
    "apim": apim_cost * 0.30,           # $450
    "gateway": app_gateway_cost * 0.30, # $120
    "total_shared": 570
},
"others": {
    "apim": apim_cost * 0.25,           # $375
    "gateway": app_gateway_cost * 0.25, # $100
    "total_shared": 475
}
}
Step 4: Write Allocation Results to Log Analytics
# Store calculated allocations for reporting
log_analytics_client.post(
workspace_id,
log_type="SharedCostAllocation",
json=[
    {
        "Month": "2026-01",
        "TenantId": "wlrs",
        "APIManagement": 675,
        "AppGateway": 180,
        "NetworkEgress": 0,  # Calculated separately in Step 5
        "TotalSharedCost": 855,
        "AllocationMethod": "api-call-proportional"
    },
    # ... repeat for other tenants
]
)
Step 5: Calculate Network Egress Costs

Network egress is billed at the subscription level and requires custom calculation from diagnostic logs.

Enable Network Diagnostics:

App Gateway → Diagnostic Settings → Send to Log Analytics
→ Enable "Access Logs" (contains bytes sent per request)

Query Logs to Calculate Per-Tenant Egress:

AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where Category == "ApplicationGatewayAccessLog"
| extend TenantId = extract("tenant=([^&]+)", 1, requestUri_s) // Parse from URL
| summarize TotalEgressGB = sum(sentBytes_d) / 1073741824 by TenantId
| extend EgressCost = case(
TotalEgressGB <= 100, 0,  # First 100 GB free
TotalEgressGB <= 10240, (TotalEgressGB - 100) * 0.087,  # $0.087/GB
TotalEgressGB > 10240, (10140 * 0.087) + ((TotalEgressGB - 10240) * 0.067)  # Tiered
)

Write these egress costs to the `SharedCostAllocation` table alongside other shared infrastructure costs.

Step 6: Combine All Costs in Final Chargeback Report

Monthly Kusto query for chargeback:

// Direct costs from Azure Cost Management (Foundry projects, DI instances)
let DirectCosts = AzureCosts
| where Tags["tenant-id"] != ""
| summarize DirectCost = sum(Cost) by TenantId = Tags["tenant-id"];

// Allocated shared infrastructure costs from custom calculation
let AllocatedCosts = SharedCostAllocation_CL
| where Month == startofmonth(now())
| summarize AllocatedCost = sum(TotalSharedCost) by TenantId;

// Final chargeback report
DirectCosts
| join kind=leftouter AllocatedCosts on TenantId
| extend TotalChargeback = DirectCost + AllocatedCost
| project TenantId, DirectCost, AllocatedCost, TotalChargeback

Output:

TenantIdDirectCostAllocatedCostTotalChargeback
wlrs$4,200$855$5,055
sdpr$2,400$570$2,970

Note: DirectCost includes all Azure OpenAI token consumption, Document Intelligence page processing, Cosmos DB, AI Search, and storage—automatically calculated by Azure based on actual usage.


Summary

This document establishes a comprehensive cost tracking framework for the AI Services Hub using:

  1. Direct attribution for tenant-dedicated resources (Foundry projects with Azure OpenAI/Cosmos/AI Search, Document Intelligence instances)—Azure automatically bills these based on actual consumption
  2. Proportional allocation for shared infrastructure (APIM, App Gateway, monitoring)—custom calculations split costs based on API call volume and network usage
  3. Minimal custom tracking via Event Hubs and Azure Functions—only needed for shared infrastructure allocation, not for AI service consumption
  4. Dual calculation methods: Azure Cost Management for tag-based direct attribution (90% of costs), and custom code for usage-based shared infrastructure allocation (10% of costs)

The implementation provides full cost transparency while minimizing operational overhead—most costs are automatically tracked by Azure, with custom calculations only for shared platform services.

ADR-006

Multi-Tenant Isolation

Diagrams

Architecture diagrams

Terraform Reference

Module variable & output docs