Cost Tracking

Work in Progress
This page is still being refined. The cost tracking design and chargeback process are not fully finalised yet. See the cost allocation decision record for the current direction.

This page explains how platform costs are tracked, how shared costs are split across teams, and which parts of the design are still being worked out. It is written for readers who want to understand the operating model before they dive into detailed formulas and example queries.

Architecture

What Each Project Gets

Resource	Type	Description
Resource Group	Dedicated	rg-{project}-{env} - contains all project resources
Storage Account	Dedicated	Project documents and data
Search Service	Dedicated	Project indexes for RAG
Key Vault	Dedicated	Project secrets and keys
AI Foundry Project	Dedicated	Project's own API endpoint, prompt flows, index (within shared Hub)
APIM Subscription	Dedicated	Project's API key for access
AI Models (GPT-4, etc.)	Shared	Deployed once in Hub, accessed via AI Foundry Project
APIM, App Gateway, Firewall	Shared	Entry point infrastructure
Azure Proxy (Chisel Server)	Shared	App Service running Chisel SOCKS5 proxy for CI/CD private-endpoint access (always-on)
CI/CD Runners (GitHub self-hosted) Optional	Optional	Container Apps job + ACR + Log Analytics — provisioned only when `github_runners_aca_enabled = true`; not used by this platform’s own CI/CD

Terraform Deploys Everything

Tenant stacks are deployed with deploy-scaled.sh which runs Terraform across five isolated state files. Resource AVM modules are used where available; raw Terraform resources are used when an AVM module does not exist.

# deploy-scaled.sh executes 3 phases:
# Phase 1: shared foundation (network, KV, AppGW, WAF)
# Phase 2: per-tenant stacks (parallel, isolated state)
# Phase 3: foundry + apim + tenant-user-mgmt
#
# Each stack corresponds to an infra-ai-hub/stacks/* directory.

AVM Modules Used

Resource	AVM Module
Storage Account	`avm-res-storage-storageaccount`
Search Service	`avm-res-search-searchservice`
Key Vault	`avm-res-keyvault-vault`
AI Foundry Hub	`avm-res-machinelearningservices-workspace`
API Management	`avm-res-apimanagement-service`
Application Gateway	`avm-res-network-applicationgateway`
Virtual Network	`avm-res-network-virtualnetwork`

How Cost Tracking Works

Direct Costs

Storage, Search, Key Vault

Tracked automatically by Azure Cost Management using Resource Group tags.

~40% of total

AI Usage

Tokens, API calls

Tracked by AI Foundry per project. Each AI Foundry Project has its own metrics.

~45% of total

Platform Split

APIM, Gateway, Firewall

Split evenly across all projects (shared infrastructure).

~15% of total

CI/CD Runner Costs (Self-hosted GitHub runners)

The optional github_runners_aca module provisions self-hosted runners inside the VNet for tenant CI/CD pipelines that need private endpoint access. Runners run as Azure Container Apps jobs (scale-to-zero) with supporting services (ACR + Log Analytics). Note: this repo's own CI/CD uses public GitHub runners + the Chisel tunnel instead.

Cost Drivers

Runner compute: Container Apps job runtime (primary driver; usage-based)
Concurrency cap: max_runners (default 4)
Sizing: container_cpu (default 1 vCPU) and container_memory (default 2Gi)
Logs: Log Analytics ingestion (GB/month)
Images: ACR Premium base cost (fixed) + storage/transfer (Premium is required for Private Link / private endpoints)

Sample Calculation (module defaults)

Assumptions: 20 jobs/day, 10 min/job, avg concurrency = 1.5

Monthly runner hours = 30 * 20 * (10/60) * 1.5 = 150 hours
    Total seconds = 150 * 3600 = 540,000 seconds

    Canada Central rates (Container Apps):
    vCPU-seconds: $0.0000480/sec
    GiB-seconds:  $0.0000057/sec
    Requests:     $0.565 per million

    vCPU cost = 540,000 * 1 vCPU * 0.0000480 = $25.92
    Memory cost = 540,000 * 2 GiB * 0.0000057 = $6.16
    Request cost (example 1M req/mo) = $0.57

    Estimated runner compute total ≈ $32.65/mo

    ACR Premium base cost = $2.351/day ≈ $70.53/mo (30-day month)
    Estimated compute + ACR base ≈ $103.18/mo

Assumes a single-region ACR Premium registry with no geo-replication and no connected registry. Add Log Analytics ingestion (GB/month) and any ACR transfer/storage beyond the included 500 GB for a full estimate.

Example Monthly Costs

Health-RAG Project

Storage + Search + KV	$420	DIRECT
AI usage (60% of tokens)	$1,020	AI
Platform share (50%)	$300	SPLIT
Total	$1,740

SDPR-Chatbot Project

Storage + Search + KV	$350	DIRECT
AI usage (40% of tokens)	$680	AI
Platform share (50%)	$300	SPLIT
Total	$1,330

Usage Monitoring, Cost Allocation, and Chargeback Metrics

Introduction

This document defines the approach for implementing Usage Monitoring, Cost Allocation, and Chargeback Metrics for the BC Government AI Services Hub multi-tenant platform. These three interconnected capabilities are essential for operating a shared AI infrastructure that serves multiple ministries while maintaining cost transparency and accountability.

Key Concepts

Usage Monitoring tracks resource consumption at the tenant level to support both cost allocation and operational insights. For the AI Services Hub, monitoring serves two purposes: capturing metrics for shared infrastructure allocation (APIM, App Gateway) and providing operational visibility into service usage patterns.

Cost Allocation combines Azure's native cost tracking with custom calculations for shared resources. The hub architecture uses two allocation models: direct attribution for tenant-dedicated resources (AI Foundry projects with their own Azure OpenAI, Cosmos DB, and AI Search; dedicated Document Intelligence instances), and proportional allocation for shared infrastructure (APIM, App Gateway, monitoring services) where costs are split based on actual usage percentages.

Chargeback Metrics aggregate all cost components—direct resource costs from Azure billing and allocated shared infrastructure costs—into consolidated monthly invoices per tenant.

Document Scope

This document covers:

Tagging strategies for both dedicated and shared resources across the dual-region deployment (Canada Central/East)
Usage tracking pipelines using APIM, Event Hubs, and Azure Functions to capture metrics for shared infrastructure allocation
Cost calculation methods combining Azure Cost Management (for direct attribution) with custom allocation logic (for shared infrastructure)
Implementation patterns with code examples for usage tracking, cost allocation functions, and Kusto queries for chargeback reporting
Network egress tracking to handle cross-region costs between Canada Central (APIM/App Gateway) and Canada East (AI Foundry)

The approach leverages APIM as the central governance point where all AI requests flow—regardless of whether backend services are inside the Foundry landing zone (OpenAI, AI Search) or outside it (Document Intelligence). APIM routing policies direct each tenant to their appropriate backend resources, enabling consistent tenant identification for shared infrastructure cost allocation.

Usage Monitoring Metrics

Azure API Management (APIM) provides centralized monitoring as all AI requests flow through the gateway. Monitoring serves two distinct purposes: tracking metrics for shared infrastructure allocation and providing operational insights.

Metrics for Shared Infrastructure Allocation

These metrics are used to proportionally split shared infrastructure costs (APIM, App Gateway, networking):

API call volume: Request count per tenant—used to allocate APIM and App Gateway costs
Network egress bytes: Response payload size per tenant—used to allocate cross-region data transfer costs
Log ingestion (GB): Application Insights data volume per tenant—used to allocate monitoring costs

Operational Metrics (Not for Chargeback)

These metrics support capacity planning, SLA monitoring, and performance optimization:

Token consumption breakdown: Which models each tenant uses and token volume (for capacity planning)
Document Intelligence pages: Pages processed per tenant (for capacity planning)
Query latency: Response times per service/tenant
Error rates: Failed requests for support escalation
Concurrent connections: Active sessions per tenant

Monitoring Architecture

APIM logs requests/responses with tenant-id to Azure Event Hubs
Azure Functions process Event Hub messages to:
- Count API calls per tenant (for APIM/Gateway allocation)
- Sum network egress bytes per tenant (for data transfer allocation)
- Extract operational metrics (tokens, pages, latency)
Results stored in Log Analytics:
- Allocation metrics used in monthly shared cost calculations
- Operational metrics used for dashboards and SLA reporting

Resource Tagging and Cost Allocation

AI Foundry Projects (Direct Attribution)

Each tenant receives a dedicated Foundry project with isolated resources. Azure automatically bills all consumption (Azure OpenAI tokens, Cosmos DB, AI Search) to the project.

Tagging strategy:

Project: "tenant-wlrs-water-permits"
Tags:
- tenant-id: "wlrs"
- cost-center: "CC-NRM-WLRS"
- department: "Natural-Resources"
- environment: "production"
- service-tier: "standard"

Cost allocation: Direct attribution via Azure Cost Management tag filtering. Query Tags["tenant-id"] == "wlrs" shows all Foundry project costs including:

Azure OpenAI token consumption (all models, prompt/completion/safety tokens)
Cosmos DB storage and operations
AI Search queries and indexing
Foundry artifact storage

No manual calculation needed—Azure bills these resources directly to each project.

Document Intelligence (Direct Attribution)

Architectural Decision: Deploy one dedicated Document Intelligence resource per tenant

Rationale:

Simplified cost allocation: Direct attribution via tags, no proportional calculation needed
Performance isolation: No "noisy neighbor" concerns with dedicated resources
Compliance: Easier to meet ministry-specific data residency requirements
Scaling: Each tenant can independently scale their DI instance

Resources:

docint-wlrs (Canada Central)
docint-sdpr (Canada Central)

Tags (on each DI resource):

Tags:
- tenant-id: "wlrs" OR "sdpr"
- shared-service: "no"
- resource-type: "document-intelligence"
- managed-by: "ai-services-hub"

Cost allocation: Direct attribution via Azure Cost Management tag filtering. Azure Cost Management query: Tags["tenant-id"] == "wlrs" shows WLRS's DI costs (page processing charges) directly.

Usage tracking (for operational metrics only):

APIM logs pages processed per tenant to Event Hubs
Used for: capacity planning, SLA monitoring, usage trends
NOT used for cost allocation (costs already directly attributed via tags)

Implementation:

Deploy DI resources in same subscription as AI Services Hub
Configure APIM backends pointing to each DI instance
Use APIM set-backend-service policy for tenant routing

Infrastructure and Platform Costs (Proportional Allocation)

These shared resources serve all tenants and require proportional cost allocation based on usage metrics.

App Gateway/WAF

Tags:

Tags:
- shared-service: "yes"
- resource-type: "app-gateway-waf-v2"
- allocation-method: "request-count-proportional"

Cost structure:

Fixed: ~$323/month (split equally across active tenants)
Variable: Allocated by capacity units consumed

Allocation method: Proportional based on request count from App Gateway access logs

APIM V2

Tags:

Tags:
- shared-service: "yes"
- allocation-method: "api-call-proportional"

Cost structure: $1,000-2,000/month depending on tier

Allocation method: Based on API call volume per tenant from Event Hubs

AI Foundry Hub Dependencies

Storage Account (Foundry hub-level):

Shared storage account for Foundry hub artifacts, flows, evaluations
Tags:
- foundry-dependency: "hub-storage"
- allocated-to: "all-projects"
Allocation method: Split equally across active projects OR by storage consumption if measurable
Note: Individual project storage is billed directly to each project (direct attribution)

Application Insights/Log Analytics:

Required for Foundry monitoring
Cost based on data ingestion volume (GB)
Tags:
- monitoring-service: "foundry"
- allocation-method: "proportional"
Allocation method: By log volume per project (if tenant-id in logs)

Key Vault:

Stores connection strings for each project
Transaction-based pricing (~$0.03/10k transactions)
Tags:
- foundry-dependency: "secrets"
- shared-service: "yes"
Allocation method: By transaction count per project (minimal cost impact)

Network Egress

Cost structure:

First 100 GB/month free per region
Then tiered pricing: $0.087/GB (next 10 TB), $0.067/GB (next 40 TB), etc.

Risk in dual-region setup: Cross-region traffic between Canada East (Foundry) and Canada Central (APIM) incurs egress charges

Tags:

Tags:
- traffic-source: "canada-east-foundry"
- traffic-destination: "canada-central-apim"
- allocation-method: "tenant-response-bytes"

Tracking mechanism: App Gateway diagnostic logs (not Azure Cost Management tags)

Note: Egress is billed at subscription level, requires custom calculation from logs (see implementation section below)

Regional Cost Tracking

The dual-region deployment (Canada Central for APIM/App Gateway, Canada East for Foundry) requires regional cost tracking.

All resources must include:

Tags:
- deployment-region: "canada-central" OR "canada-east"
- primary-region: "canada-central"

Why this matters:

Pricing variations: Some Azure services have different pricing between Canada Central and Canada East
Cross-region attribution: When WLRS uses Foundry in Canada East but APIM in Canada Central, costs must be attributed correctly
Egress tracking: Cross-region traffic needs regional source/destination tags

Chargeback Metrics Summary

Monthly tenant invoices combine two cost categories:

Direct Costs (Azure-Billed, No Calculation Needed)

Retrieved via Azure Cost Management tag filtering (tenant-id):

AI Foundry project costs: All Azure OpenAI consumption (tokens), Cosmos DB, AI Search, project storage
Document Intelligence costs: Page processing charges for dedicated DI instance
Foundry compute costs: Any custom agent execution resources (if applicable)

Allocated Costs (Calculated from Usage Metrics)

Proportionally split based on tenant usage:

APIM costs: Split by API call volume per tenant
App Gateway costs: Split by request count per tenant
Network egress costs: Split by response bytes per tenant
Application Insights costs: Split by log ingestion volume per tenant
Shared storage costs: Split equally or by consumption across Foundry projects

Implementation: Cost Calculation Methods

Method 1: Azure Cost Management (Built-in, No Code)

Azure Cost Management handles all direct attribution automatically. No custom code needed for tenant-dedicated resources.

Step 1: Enable Tag Inheritance

Azure Portal → Cost Management → Settings → Configuration
→ Enable "Automatically apply subscription and resource group tags to new data"

This propagates tags from subscriptions/resource groups down to individual usage records.

Step 2: View Direct Attribution Costs

Cost Analysis → Add Filter → Tag
→ Select "tenant-id" → Choose "wlrs"

This shows all costs where tenant-id = wlrs, including:

Foundry project (Azure OpenAI tokens, Cosmos DB, AI Search, storage)
Document Intelligence (page processing)
Any other dedicated resources

This is direct attribution—Azure calculates it automatically based on actual consumption.

Step 3: Create Cost Allocation Rules for Shared Resources

This is where shared infrastructure costs (APIM, App Gateway) get split:

Cost Management → Cost Allocation Rules → Add Rule

SOURCE (what to split):
- Resource Group: "rg-ai-hub-shared-infra"
- Tag filter: shared-service = "yes"

TARGETS (who receives the split):
- Tag: tenant-id = "wlrs"
- Tag: tenant-id = "sdpr"
- Tag: tenant-id = "other-ministry"

ALLOCATION METHOD:
Option A: "Distribute evenly" → Each tenant gets 33.33%
Option B: "Total cost proportional" → Split based on each tenant's existing costs
Option C: "Custom percentage" → Manually set: WLRS 45%, SDPR 30%, Others 25%

Limitation: The "proportional" options only work based on existing Azure costs, not custom metrics like "API call count". For usage-based allocation, use Method 2 below.

Result: Monthly report showing allocated costs per tenant:

WLRS: $5,230 (includes $675 allocated from shared APIM)
SDPR: $3,100 (includes $450 allocated from shared APIM)

Method 2: Custom Calculation (Usage-Based Allocation)

For proportional allocation based on usage metrics (API calls, egress bytes), custom code is required.

Architecture

Event Hubs (usage data)
→ Azure Function (calculate percentages)
→ Log Analytics (store allocation data)
→ Power BI/Cost Dashboard (reporting)

Step 1: Calculate Tenant Usage Percentages

Azure Function runs monthly (triggered by timer):

# Pseudo-code for monthly allocation calculation
import kusto_client

# Query Event Hub processed data for API call counts
query = """
customEvents
| where timestamp >= startofmonth(now()) and timestamp < startofmonth(now(), 1)
| where name == "APIM-Request-Log"
| summarize RequestCount = count() by TenantId
"""

results = kusto_client.execute(query)
# Results: {"wlrs": 45000, "sdpr": 30000, "others": 25000}

total_requests = sum(results.values())  # 100,000

# Calculate percentages for shared infrastructure allocation
allocations = {
tenant: (count / total_requests) * 100
for tenant, count in results.items()
}
# Result: {"wlrs": 45%, "sdpr": 30%, "others": 25%}

Step 2: Query Azure Cost Management API for Shared Resource Costs

from azure.mgmt.costmanagement import CostManagementClient

# Get actual costs for shared resources
cost_query = {
"type": "ActualCost",
"timeframe": "MonthToDate",
"filter": {
    "tags": {
        "name": "shared-service",
        "operator": "In",
        "values": ["yes"]
    }
}
}

shared_costs = cost_client.query(scope, cost_query)
# Result: APIM = $1,500, App Gateway = $400, Total = $1,900

Step 3: Apply Allocation Percentages

# Calculate each tenant's share of shared infrastructure
apim_cost = 1500
app_gateway_cost = 400

tenant_allocations = {
"wlrs": {
    "apim": apim_cost * 0.45,           # $675
    "gateway": app_gateway_cost * 0.45, # $180
    "total_shared": 855
},
"sdpr": {
    "apim": apim_cost * 0.30,           # $450
    "gateway": app_gateway_cost * 0.30, # $120
    "total_shared": 570
},
"others": {
    "apim": apim_cost * 0.25,           # $375
    "gateway": app_gateway_cost * 0.25, # $100
    "total_shared": 475
}
}

Step 4: Write Allocation Results to Log Analytics

# Store calculated allocations for reporting
log_analytics_client.post(
workspace_id,
log_type="SharedCostAllocation",
json=[
    {
        "Month": "2026-01",
        "TenantId": "wlrs",
        "APIManagement": 675,
        "AppGateway": 180,
        "NetworkEgress": 0,  # Calculated separately in Step 5
        "TotalSharedCost": 855,
        "AllocationMethod": "api-call-proportional"
    },
    # ... repeat for other tenants
]
)

Step 5: Calculate Network Egress Costs

Network egress is billed at the subscription level and requires custom calculation from diagnostic logs.

Enable Network Diagnostics:

App Gateway → Diagnostic Settings → Send to Log Analytics
→ Enable "Access Logs" (contains bytes sent per request)

Query Logs to Calculate Per-Tenant Egress:

AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where Category == "ApplicationGatewayAccessLog"
| extend TenantId = extract("tenant=([^&]+)", 1, requestUri_s) // Parse from URL
| summarize TotalEgressGB = sum(sentBytes_d) / 1073741824 by TenantId
| extend EgressCost = case(
TotalEgressGB <= 100, 0,  # First 100 GB free
TotalEgressGB <= 10240, (TotalEgressGB - 100) * 0.087,  # $0.087/GB
TotalEgressGB > 10240, (10140 * 0.087) + ((TotalEgressGB - 10240) * 0.067)  # Tiered
)

Write these egress costs to the `SharedCostAllocation` table alongside other shared infrastructure costs.

Step 6: Combine All Costs in Final Chargeback Report

Monthly Kusto query for chargeback:

// Direct costs from Azure Cost Management (Foundry projects, DI instances)
let DirectCosts = AzureCosts
| where Tags["tenant-id"] != ""
| summarize DirectCost = sum(Cost) by TenantId = Tags["tenant-id"];

// Allocated shared infrastructure costs from custom calculation
let AllocatedCosts = SharedCostAllocation_CL
| where Month == startofmonth(now())
| summarize AllocatedCost = sum(TotalSharedCost) by TenantId;

// Final chargeback report
DirectCosts
| join kind=leftouter AllocatedCosts on TenantId
| extend TotalChargeback = DirectCost + AllocatedCost
| project TenantId, DirectCost, AllocatedCost, TotalChargeback

Output:

TenantId	DirectCost	AllocatedCost	TotalChargeback
wlrs	$4,200	$855	$5,055
sdpr	$2,400	$570	$2,970

Note: DirectCost includes all Azure OpenAI token consumption, Document Intelligence page processing, Cosmos DB, AI Search, and storage—automatically calculated by Azure based on actual usage.

Summary

This document establishes a comprehensive cost tracking framework for the AI Services Hub using:

Direct attribution for tenant-dedicated resources (Foundry projects with Azure OpenAI/Cosmos/AI Search, Document Intelligence instances)—Azure automatically bills these based on actual consumption
Proportional allocation for shared infrastructure (APIM, App Gateway, monitoring)—custom calculations split costs based on API call volume and network usage
Minimal custom tracking via Event Hubs and Azure Functions—only needed for shared infrastructure allocation, not for AI service consumption
Dual calculation methods: Azure Cost Management for tag-based direct attribution (90% of costs), and custom code for usage-based shared infrastructure allocation (10% of costs)

The implementation provides full cost transparency while minimizing operational overhead—most costs are automatically tracked by Azure, with custom calculations only for shared platform services.

ADR-006

Multi-Tenant Isolation

Diagrams

Architecture diagrams

Terraform Reference

Module variable & output docs

Cost Tracking

Architecture

What Each Project Gets

Terraform Deploys Everything

AVM Modules Used

How Cost Tracking Works

Direct Costs

AI Usage

Platform Split

CI/CD Runner Costs (Self-hosted GitHub runners)

Cost Drivers

Sample Calculation (module defaults)

Example Monthly Costs

Health-RAG Project

SDPR-Chatbot Project

Usage Monitoring, Cost Allocation, and Chargeback Metrics

Introduction

Key Concepts

Document Scope

Usage Monitoring Metrics

Metrics for Shared Infrastructure Allocation

Operational Metrics (Not for Chargeback)

Monitoring Architecture

Resource Tagging and Cost Allocation

AI Foundry Projects (Direct Attribution)

Document Intelligence (Direct Attribution)

Infrastructure and Platform Costs (Proportional Allocation)

App Gateway/WAF

APIM V2

AI Foundry Hub Dependencies

Network Egress

Regional Cost Tracking

Chargeback Metrics Summary

Direct Costs (Azure-Billed, No Calculation Needed)

Allocated Costs (Calculated from Usage Metrics)

Implementation: Cost Calculation Methods

Method 1: Azure Cost Management (Built-in, No Code)

Step 1: Enable Tag Inheritance

Step 2: View Direct Attribution Costs

Step 3: Create Cost Allocation Rules for Shared Resources

Method 2: Custom Calculation (Usage-Based Allocation)

Architecture

Step 1: Calculate Tenant Usage Percentages

Step 2: Query Azure Cost Management API for Shared Resource Costs

Step 3: Apply Allocation Percentages

Step 4: Write Allocation Results to Log Analytics

Step 5: Calculate Network Egress Costs

Step 6: Combine All Costs in Final Chargeback Report

Summary

Related

ADR-006

Diagrams

Terraform Reference