AI Services Hub
Azure Landing Zone Infrastructure

Internal Gateway Endpoints

Each tenant exposes two internal gateway endpoints that are available only to authenticated callers holding a valid subscription key for the Azure gateway service for application programming interfaces. These endpoints do not forward traffic to an artificial intelligence backend. Instead, the gateway policy handles the request directly and returns the response itself.

Authentication Required
All internal endpoints require a valid subscription key for the gateway in the api-key header. Unauthenticated requests receive 401 Unauthorized. Only GET is accepted, and any other method returns 405 Method Not Allowed.

/internal/apim-keys

Returns both subscription keys and rotation metadata sourced from the centralized hub Key Vault at request time.

Requires: APIM enabled with subscription-key auth. Available for all tenants (not gated on per-tenant key rotation).

View reference ↓

/internal/tenant-info

Returns the tenant's deployed AI models with quota information and a list of enabled services. Always available — no Key Vault access needed.

Requires: nothing beyond a valid subscription key.

View reference ↓

GET /{tenant-name}/internal/apim-keys

Overview

Returns the tenant's current primary and secondary APIM subscription keys, plus rotation metadata, from the centralized hub Key Vault. Use this endpoint to programmatically refresh your stored key after a rotation event without needing direct Key Vault access.

See the APIM Key Rotation guide for the full operational workflow, rotation schedule, and troubleshooting information.

Authentication

Pass either the primary or secondary subscription key in the api-key header. APIM validates it before running any policy logic — the still-valid key (not the one just rotated) always works.

curl -s -H "api-key: YOUR_SUBSCRIPTION_KEY" \
    "https://your-apim-gateway.azure-api.net/YOUR_TENANT/internal/apim-keys" | jq .

How It Works

  1. APIM validates the incoming subscription key (standard APIM behavior).
  2. APIM policy uses its system-assigned managed identity to read secrets from the centralized hub Key Vault: {tenant}-apim-primary-key and {tenant}-apim-secondary-key (always), plus {tenant}-apim-rotation-metadata (only for tenants with key_rotation_enabled = true).
  3. Secrets are assembled into a JSON response and returned directly — no backend service is called. Tenants without rotation enabled receive a default rotation object indicating rotation is not active.

Response (200 OK)

{
  "tenant": "your-tenant-name",
  "primary_key": "abc123def456...",
  "secondary_key": "ghi789jkl012...",
  "rotation": {
    "last_rotated_slot": "primary",
    "last_rotation_at": "2026-02-11T02:00:00Z",
    "next_rotation_at": "2026-02-18T02:00:00Z",
    "rotation_number": 5,
    "safe_slot": "secondary"
  },
  "keyvault": {
    "uri": "https://hub-kv.vault.azure.net/",
    "primary_key_secret": "your-tenant-name-apim-primary-key",
    "secondary_key_secret": "your-tenant-name-apim-secondary-key"
  }
}
FieldTypeDescription
tenantstringThe tenant identifier (matches the API path prefix).
primary_keystringCurrent value of the APIM primary subscription key slot.
secondary_keystringCurrent value of the APIM secondary subscription key slot.
rotation.last_rotated_slotstring"primary" or "secondary" — the slot regenerated most recently.
rotation.safe_slotstringThe slot that was not just rotated. Switch to this key.
rotation.last_rotation_atISO 8601Timestamp of the most recent rotation.
rotation.next_rotation_atISO 8601Estimated timestamp of the next scheduled rotation.
rotation.rotation_numbernumberMonotonically increasing rotation counter.
keyvault.uristringHub Key Vault base URI.
keyvault.primary_key_secretstringSecret name for the primary key in the hub Key Vault.
keyvault.secondary_key_secretstringSecret name for the secondary key in the hub Key Vault.

Error Responses

StatusCauseBody (error.code)
401Missing or invalid subscription key.Standard APIM 401
404Tenant does not use subscription-key auth mode; endpoint not present in policy.NotFound
405Non-GET method used.MethodNotAllowed
502APIM managed identity could not read one or more secrets from the hub Key Vault.KeyVaultReadFailed

Infrastructure

Implemented purely in APIM policy. APIM's system-assigned managed identity holds a single Key Vault Secrets User RBAC assignment on the centralized hub Key Vault — one assignment scales to all tenants. Available for all subscription-key tenants when APIM is enabled. Rotation metadata is included only for tenants with key_rotation_enabled = true; other tenants receive a default rotation object indicating rotation is not active.

Source: infra-ai-hub/params/apim/api_policy.xml.tftpl (apim_keys_endpoint_enabled block).

GET /{tenant-name}/internal/tenant-info

Overview

Returns the tenant's deployed AI models (with quota information) and the set of enabled AI services. The response is fully static — data is baked in at Terraform deploy time via templatefile(), so no Key Vault or backend calls are made at runtime. Always available; there is no feature flag that disables this endpoint.

Use this endpoint to:

Provisioned throughput note: For provisioned models, Microsoft Foundry measures throughput as input-equivalent tokens per minute, and some models weight output tokens more heavily than input tokens. For WLRS gpt-5.1, APIM now enforces non-streaming traffic on a dedicated PTU backend using actual response usage with prompt tokens weighted x1 and completion tokens weighted x8 against a 71,250 weighted TPM ceiling. The conservative 8,906 raw TPM value remains published as a fallback for streaming requests, where APIM cannot reliably read final SSE usage before returning the stream.

Authentication

Pass the subscription key in the api-key header. APIM validates it before returning any data.

curl -s -H "api-key: YOUR_SUBSCRIPTION_KEY" \
    "https://your-apim-gateway.azure-api.net/YOUR_TENANT/internal/tenant-info" | jq .

How It Works

  1. APIM validates the incoming subscription key.
  2. The policy matches paths ending in internal/tenant-info and returns the pre-rendered JSON body directly — no backend service is called.
  3. The JSON is generated once by Terraform's templatefile() from values in tenant.tfvars (openai.model_deployments, service feature flags) and embedded in the APIM policy at deploy time.
Static data: A new Terraform apim stack deploy is required for model or service flag changes to be reflected in the response.

Response (200 OK)

{
  "tenant": "wlrs-water-form-assistant",
  "base_url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant",
  "models": [
    {
            "name": "gpt-5.1",
            "model_name": "gpt-5.1",
            "model_version": "2025-11-13",
            "scale_type": "GlobalProvisionedManaged",
            "capacity": 15,
            "capacity_unit": "PTU",
            "capacity_k_tpm": null,
            "input_tpm_per_ptu": 4750,
            "output_tokens_to_input_ratio": 8,
            "token_limit_strategy": "response_weighted_actual_tokens",
            "prompt_tokens_weight": 1,
            "completion_tokens_weight": 8,
            "weighted_tokens_per_minute": 71250,
            "apim_raw_tokens_per_minute": 8906,
            "input_equivalent_tokens_per_minute": 71250,
            "tokens_per_minute": 8906,
      "endpoints": {
        "azure_openai": {
          "endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant",
          "api_version": "2025-03-01-preview",
                    "url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/deployments/gpt-5.1/chat/completions?api-version=2025-03-01-preview"
        },
        "openai_compatible": {
          "base_url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/v1",
                    "model": "gpt-5.1",
          "url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/v1/chat/completions"
        }
      }
    },
    {
            "name": "gpt-4.1-mini",
            "model_name": "gpt-4.1-mini",
      "model_version": "2025-04-14",
      "scale_type": "GlobalStandard",
            "capacity": 1500,
            "capacity_unit": "k TPM",
            "capacity_k_tpm": 1500,
            "input_tpm_per_ptu": null,
            "output_tokens_to_input_ratio": null,
            "token_limit_strategy": "raw_tokens_per_minute",
            "prompt_tokens_weight": 1,
            "completion_tokens_weight": 1,
            "weighted_tokens_per_minute": 1500000,
            "apim_raw_tokens_per_minute": 1500000,
            "input_equivalent_tokens_per_minute": 1500000,
            "tokens_per_minute": 1500000,
      "endpoints": {
        "azure_openai": {
          "endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant",
          "api_version": "2025-03-01-preview",
                    "url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/deployments/gpt-4.1-mini/chat/completions?api-version=2025-03-01-preview"
        },
        "openai_compatible": {
          "base_url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/v1",
                    "model": "gpt-4.1-mini",
          "url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/v1/chat/completions"
        }
      }
    }
  ],
  "services": {
    "openai": {
      "enabled": true,
      "endpoints": {
        "azure_openai": "https://aihub.gov.bc.ca/wlrs-water-form-assistant",
        "openai_compatible": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/v1",
        "api_version": "2025-03-01-preview"
      }
    },
    "document_intelligence": {
      "enabled": true,
      "endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant",
      "example": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-11-30"
    },
    "ai_search": {
      "enabled": true,
      "endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/ai-search",
      "example": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/ai-search/indexes/{index-name}/docs/search?api-version=2024-07-01"
    },
    "speech_services": {
      "enabled": true,
      "stt_endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/speech/recognition/conversation/cognitiveservices/v1?language=en-US",
      "tts_endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/cognitiveservices/v1"
    },
    "storage": {
      "enabled": true,
      "endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/storage",
      "example": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/storage/{container-name}/{blob-name}"
    }
  }
}
FieldTypeDescription
tenantstringThe tenant identifier (matches the API path prefix).
base_urlstringClient-facing base URL for this tenant. Uses App Gateway URL when deployed; falls back to direct APIM gateway URL.
models[].namestringDeployment name used in API calls (e.g., gpt-4.1-mini).
models[].model_namestringUnderlying Azure OpenAI model name.
models[].model_versionstringModel version date string.
models[].scale_typestringDeployment scale type (e.g., GlobalStandard).
models[].capacitynumberRaw capacity value from tenant.tfvars. This is expressed in capacity_unit.
models[].capacity_unitstringCapacity unit for the deployment. Current values are k TPM for quota-based deployments and PTU for provisioned deployments.
models[].capacity_k_tpmnumber|nullAllocated quota in thousands of tokens per minute for quota-based deployments. null for provisioned deployments.
models[].input_tpm_per_ptunumber|nullModel-specific input TPM available per PTU for provisioned deployments. null for quota-based deployments.
models[].output_tokens_to_input_rationumber|nullFor provisioned deployments, how many input-equivalent tokens one output token consumes. null for quota-based deployments.
models[].token_limit_strategystringThe APIM enforcement mode for the model. Current values are raw_tokens_per_minute for quota-backed models and response_weighted_actual_tokens for provisioned (PTU) deployments.
models[].prompt_tokens_weightnumberThe multiplier APIM applies to prompt tokens when computing weighted token counts for PTU rate limiting.
models[].completion_tokens_weightnumberThe multiplier APIM applies to completion tokens when computing weighted token counts for PTU rate limiting.
models[].weighted_tokens_per_minutenumberThe weighted TPM budget used by APIM. For provisioned (PTU) deployments this matches the Foundry input-equivalent TPM ceiling; for quota-based models it matches the raw TPM cap.
models[].apim_raw_tokens_per_minutenumberThe raw prompt + completion token ceiling. For quota-based models this is the enforced limit. For provisioned (PTU) deployments this is retained as the streaming/raw fallback cap.
models[].input_equivalent_tokens_per_minutenumberThe Foundry throughput ceiling expressed in input-equivalent tokens per minute. For quota-based deployments this matches the APIM cap. For provisioned deployments this is typically higher than the raw fallback cap because output tokens are weighted more heavily.
models[].tokens_per_minutenumberLegacy alias for apim_raw_tokens_per_minute. Use the explicit fields above for new consumers.
models[].endpoints.azure_openai.endpointstringEndpoint for the Azure OpenAI SDK (AzureOpenAI(azure_endpoint=...)).
models[].endpoints.azure_openai.api_versionstringRecommended API version for Azure OpenAI SDK calls.
models[].endpoints.azure_openai.urlstringFull URL for this deployment’s chat completions via Azure OpenAI SDK format.
models[].endpoints.openai_compatible.base_urlstringBase URL for the OpenAI SDK (OpenAI(base_url=...)).
models[].endpoints.openai_compatible.modelstringModel name to pass in request body for the OpenAI SDK.
models[].endpoints.openai_compatible.urlstringFull URL for this deployment’s chat completions via OpenAI-compatible format.
services.<name>.enabledbooltrue when the service is enabled for this tenant; false otherwise (only enabled key present).
services.openai.endpointsobjectTop-level OpenAI endpoints: azure_openai, openai_compatible, and recommended api_version.
services.document_intelligence.endpointstringBase endpoint for Document Intelligence API calls.
services.document_intelligence.examplestringExample URL for the prebuilt-layout analyze operation.
services.ai_search.endpointstringBase endpoint for AI Search API calls (includes /ai-search prefix).
services.ai_search.examplestringExample URL for a search index query.
services.speech_services.stt_endpointstringFull endpoint for Speech-to-Text recognition requests (includes path and default language).
services.speech_services.tts_endpointstringFull endpoint for Text-to-Speech synthesis requests.
services.storage.endpointstringBase endpoint for Storage proxy requests (includes /storage prefix).

Error Responses

StatusCause
401Missing or invalid subscription key.
405Non-GET method used.

Infrastructure

Implemented purely in APIM policy. Enabled for every tenant unconditionally (tenant_info_enabled = true in stacks/apim/locals.tf). No Key Vault grants required.

Source: infra-ai-hub/params/apim/api_policy.xml.tftpl (tenant_info_enabled block).

Integration Tests

Both endpoints have Python integration test coverage in tests/integration/tests/:

SuiteEndpointProxy required
test_apim_key_rotation.py /internal/apim-keys Yes — Key Vault is private-only. Runs via the chisel/privoxy tunnel in CI.
test_tenant_info.py /internal/tenant-info No — response is static, no Key Vault calls. Runs in the direct (no-proxy) CI step.
# Run tenant-info tests locally
cd tests/integration
./run-tests.sh --env test tenant-info

# Run apim-keys tests (requires az login + hub Key Vault access)
./run-tests.sh --env test --group proxy apim-key-rotation