Internal Gateway Endpoints

Each tenant exposes two internal gateway endpoints that are available only to authenticated callers holding a valid subscription key for the Azure gateway service for application programming interfaces. These endpoints do not forward traffic to an artificial intelligence backend. Instead, the gateway policy handles the request directly and returns the response itself.

Authentication Required
All internal endpoints require a valid subscription key for the gateway in the api-key header. Unauthenticated requests receive 401 Unauthorized. Only GET is accepted, and any other method returns 405 Method Not Allowed.

`/internal/apim-keys`

Returns both subscription keys and rotation metadata sourced from the centralized hub Key Vault at request time.

Requires: APIM enabled with subscription-key auth. Available for all tenants (not gated on per-tenant key rotation).

View reference ↓

`/internal/tenant-info`

Returns the tenant's deployed AI models with quota information and a list of enabled services. Always available — no Key Vault access needed.

Requires: nothing beyond a valid subscription key.

View reference ↓

`GET /{tenant-name}/internal/apim-keys`

Overview

Returns the tenant's current primary and secondary APIM subscription keys, plus rotation metadata, from the centralized hub Key Vault. Use this endpoint to programmatically refresh your stored key after a rotation event without needing direct Key Vault access.

See the APIM Key Rotation guide for the full operational workflow, rotation schedule, and troubleshooting information.

Authentication

Pass either the primary or secondary subscription key in the api-key header. APIM validates it before running any policy logic — the still-valid key (not the one just rotated) always works.

curl -s -H "api-key: YOUR_SUBSCRIPTION_KEY" \
    "https://your-apim-gateway.azure-api.net/YOUR_TENANT/internal/apim-keys" | jq .

How It Works

APIM validates the incoming subscription key (standard APIM behavior).
APIM policy uses its system-assigned managed identity to read secrets from the centralized hub Key Vault: {tenant}-apim-primary-key and {tenant}-apim-secondary-key (always), plus {tenant}-apim-rotation-metadata (only for tenants with key_rotation_enabled = true).
Secrets are assembled into a JSON response and returned directly — no backend service is called. Tenants without rotation enabled receive a default rotation object indicating rotation is not active.

Response (200 OK)

{
  "tenant": "your-tenant-name",
  "primary_key": "abc123def456...",
  "secondary_key": "ghi789jkl012...",
  "rotation": {
    "last_rotated_slot": "primary",
    "last_rotation_at": "2026-02-11T02:00:00Z",
    "next_rotation_at": "2026-02-18T02:00:00Z",
    "rotation_number": 5,
    "safe_slot": "secondary"
  },
  "keyvault": {
    "uri": "https://hub-kv.vault.azure.net/",
    "primary_key_secret": "your-tenant-name-apim-primary-key",
    "secondary_key_secret": "your-tenant-name-apim-secondary-key"
  }
}

Field	Type	Description
`tenant`	string	The tenant identifier (matches the API path prefix).
`primary_key`	string	Current value of the APIM primary subscription key slot.
`secondary_key`	string	Current value of the APIM secondary subscription key slot.
`rotation.last_rotated_slot`	string	`"primary"` or `"secondary"` — the slot regenerated most recently.
`rotation.safe_slot`	string	The slot that was not just rotated. Switch to this key.
`rotation.last_rotation_at`	ISO 8601	Timestamp of the most recent rotation.
`rotation.next_rotation_at`	ISO 8601	Estimated timestamp of the next scheduled rotation.
`rotation.rotation_number`	number	Monotonically increasing rotation counter.
`keyvault.uri`	string	Hub Key Vault base URI.
`keyvault.primary_key_secret`	string	Secret name for the primary key in the hub Key Vault.
`keyvault.secondary_key_secret`	string	Secret name for the secondary key in the hub Key Vault.

Error Responses

Status	Cause	Body (`error.code`)
`401`	Missing or invalid subscription key.	Standard APIM 401
`404`	Tenant does not use subscription-key auth mode; endpoint not present in policy.	`NotFound`
`405`	Non-GET method used.	`MethodNotAllowed`
`502`	APIM managed identity could not read one or more secrets from the hub Key Vault.	`KeyVaultReadFailed`

Infrastructure

Implemented purely in APIM policy. APIM's system-assigned managed identity holds a single Key Vault Secrets User RBAC assignment on the centralized hub Key Vault — one assignment scales to all tenants. Available for all subscription-key tenants when APIM is enabled. Rotation metadata is included only for tenants with key_rotation_enabled = true; other tenants receive a default rotation object indicating rotation is not active.

Source: infra-ai-hub/params/apim/api_policy.xml.tftpl (apim_keys_endpoint_enabled block).

`GET /{tenant-name}/internal/tenant-info`

Overview

Returns the tenant's deployed AI models (with quota information) and the set of enabled AI services. The response is fully static — data is baked in at Terraform deploy time via templatefile(), so no Key Vault or backend calls are made at runtime. Always available; there is no feature flag that disables this endpoint.

Use this endpoint to:

Discover which models are available, including the weighted PTU limit, raw fallback cap, and other provisioned-throughput metadata.
Confirm which AI services (Document Intelligence, AI Search, Speech, etc.) are enabled for your tenant.
Drive UI dropdowns or config auto-discovery without maintaining a separate config file.

Provisioned throughput note: For provisioned models, Microsoft Foundry measures throughput as input-equivalent tokens per minute, and some models weight output tokens more heavily than input tokens. For WLRS gpt-5.1, APIM now enforces non-streaming traffic on a dedicated PTU backend using actual response usage with prompt tokens weighted x1 and completion tokens weighted x8 against a 71,250 weighted TPM ceiling. The conservative 8,906 raw TPM value remains published as a fallback for streaming requests, where APIM cannot reliably read final SSE usage before returning the stream.

Authentication

Pass the subscription key in the api-key header. APIM validates it before returning any data.

curl -s -H "api-key: YOUR_SUBSCRIPTION_KEY" \
    "https://your-apim-gateway.azure-api.net/YOUR_TENANT/internal/tenant-info" | jq .

How It Works

APIM validates the incoming subscription key.
The policy matches paths ending in internal/tenant-info and returns the pre-rendered JSON body directly — no backend service is called.
The JSON is generated once by Terraform's templatefile() from values in tenant.tfvars (openai.model_deployments, service feature flags) and embedded in the APIM policy at deploy time.

Static data: A new Terraform apim stack deploy is required for model or service flag changes to be reflected in the response.

Response (200 OK)

{
  "tenant": "wlrs-water-form-assistant",
  "base_url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant",
  "models": [
    {
            "name": "gpt-5.1",
            "model_name": "gpt-5.1",
            "model_version": "2025-11-13",
            "scale_type": "GlobalProvisionedManaged",
            "capacity": 15,
            "capacity_unit": "PTU",
            "capacity_k_tpm": null,
            "input_tpm_per_ptu": 4750,
            "output_tokens_to_input_ratio": 8,
            "token_limit_strategy": "response_weighted_actual_tokens",
            "prompt_tokens_weight": 1,
            "completion_tokens_weight": 8,
            "weighted_tokens_per_minute": 71250,
            "apim_raw_tokens_per_minute": 8906,
            "input_equivalent_tokens_per_minute": 71250,
            "tokens_per_minute": 8906,
      "endpoints": {
        "azure_openai": {
          "endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant",
          "api_version": "2025-03-01-preview",
                    "url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/deployments/gpt-5.1/chat/completions?api-version=2025-03-01-preview"
        },
        "openai_compatible": {
          "base_url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/v1",
                    "model": "gpt-5.1",
          "url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/v1/chat/completions"
        }
      }
    },
    {
            "name": "gpt-4.1-mini",
            "model_name": "gpt-4.1-mini",
      "model_version": "2025-04-14",
      "scale_type": "GlobalStandard",
            "capacity": 1500,
            "capacity_unit": "k TPM",
            "capacity_k_tpm": 1500,
            "input_tpm_per_ptu": null,
            "output_tokens_to_input_ratio": null,
            "token_limit_strategy": "raw_tokens_per_minute",
            "prompt_tokens_weight": 1,
            "completion_tokens_weight": 1,
            "weighted_tokens_per_minute": 1500000,
            "apim_raw_tokens_per_minute": 1500000,
            "input_equivalent_tokens_per_minute": 1500000,
            "tokens_per_minute": 1500000,
      "endpoints": {
        "azure_openai": {
          "endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant",
          "api_version": "2025-03-01-preview",
                    "url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/deployments/gpt-4.1-mini/chat/completions?api-version=2025-03-01-preview"
        },
        "openai_compatible": {
          "base_url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/v1",
                    "model": "gpt-4.1-mini",
          "url": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/v1/chat/completions"
        }
      }
    }
  ],
  "services": {
    "openai": {
      "enabled": true,
      "endpoints": {
        "azure_openai": "https://aihub.gov.bc.ca/wlrs-water-form-assistant",
        "openai_compatible": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/openai/v1",
        "api_version": "2025-03-01-preview"
      }
    },
    "document_intelligence": {
      "enabled": true,
      "endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant",
      "example": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-11-30"
    },
    "ai_search": {
      "enabled": true,
      "endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/ai-search",
      "example": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/ai-search/indexes/{index-name}/docs/search?api-version=2024-07-01"
    },
    "speech_services": {
      "enabled": true,
      "stt_endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/speech/recognition/conversation/cognitiveservices/v1?language=en-US",
      "tts_endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/cognitiveservices/v1"
    },
    "storage": {
      "enabled": true,
      "endpoint": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/storage",
      "example": "https://aihub.gov.bc.ca/wlrs-water-form-assistant/storage/{container-name}/{blob-name}"
    }
  }
}

Field	Type	Description
`tenant`	string	The tenant identifier (matches the API path prefix).
`base_url`	string	Client-facing base URL for this tenant. Uses App Gateway URL when deployed; falls back to direct APIM gateway URL.
`models[].name`	string	Deployment name used in API calls (e.g., `gpt-4.1-mini`).
`models[].model_name`	string	Underlying Azure OpenAI model name.
`models[].model_version`	string	Model version date string.
`models[].scale_type`	string	Deployment scale type (e.g., `GlobalStandard`).
`models[].capacity`	number	Raw capacity value from `tenant.tfvars`. This is expressed in `capacity_unit`.
`models[].capacity_unit`	string	Capacity unit for the deployment. Current values are `k TPM` for quota-based deployments and `PTU` for provisioned deployments.
`models[].capacity_k_tpm`	number\|null	Allocated quota in thousands of tokens per minute for quota-based deployments. `null` for provisioned deployments.
`models[].input_tpm_per_ptu`	number\|null	Model-specific input TPM available per PTU for provisioned deployments. `null` for quota-based deployments.
`models[].output_tokens_to_input_ratio`	number\|null	For provisioned deployments, how many input-equivalent tokens one output token consumes. `null` for quota-based deployments.
`models[].token_limit_strategy`	string	The APIM enforcement mode for the model. Current values are `raw_tokens_per_minute` for quota-backed models and `response_weighted_actual_tokens` for provisioned (PTU) deployments.
`models[].prompt_tokens_weight`	number	The multiplier APIM applies to prompt tokens when computing weighted token counts for PTU rate limiting.
`models[].completion_tokens_weight`	number	The multiplier APIM applies to completion tokens when computing weighted token counts for PTU rate limiting.
`models[].weighted_tokens_per_minute`	number	The weighted TPM budget used by APIM. For provisioned (PTU) deployments this matches the Foundry input-equivalent TPM ceiling; for quota-based models it matches the raw TPM cap.
`models[].apim_raw_tokens_per_minute`	number	The raw prompt + completion token ceiling. For quota-based models this is the enforced limit. For provisioned (PTU) deployments this is retained as the streaming/raw fallback cap.
`models[].input_equivalent_tokens_per_minute`	number	The Foundry throughput ceiling expressed in input-equivalent tokens per minute. For quota-based deployments this matches the APIM cap. For provisioned deployments this is typically higher than the raw fallback cap because output tokens are weighted more heavily.
`models[].tokens_per_minute`	number	Legacy alias for `apim_raw_tokens_per_minute`. Use the explicit fields above for new consumers.
`models[].endpoints.azure_openai.endpoint`	string	Endpoint for the Azure OpenAI SDK (`AzureOpenAI(azure_endpoint=...)`).
`models[].endpoints.azure_openai.api_version`	string	Recommended API version for Azure OpenAI SDK calls.
`models[].endpoints.azure_openai.url`	string	Full URL for this deployment’s chat completions via Azure OpenAI SDK format.
`models[].endpoints.openai_compatible.base_url`	string	Base URL for the OpenAI SDK (`OpenAI(base_url=...)`).
`models[].endpoints.openai_compatible.model`	string	Model name to pass in request body for the OpenAI SDK.
`models[].endpoints.openai_compatible.url`	string	Full URL for this deployment’s chat completions via OpenAI-compatible format.
`services.<name>.enabled`	bool	`true` when the service is enabled for this tenant; `false` otherwise (only `enabled` key present).
`services.openai.endpoints`	object	Top-level OpenAI endpoints: `azure_openai`, `openai_compatible`, and recommended `api_version`.
`services.document_intelligence.endpoint`	string	Base endpoint for Document Intelligence API calls.
`services.document_intelligence.example`	string	Example URL for the prebuilt-layout analyze operation.
`services.ai_search.endpoint`	string	Base endpoint for AI Search API calls (includes `/ai-search` prefix).
`services.ai_search.example`	string	Example URL for a search index query.
`services.speech_services.stt_endpoint`	string	Full endpoint for Speech-to-Text recognition requests (includes path and default language).
`services.speech_services.tts_endpoint`	string	Full endpoint for Text-to-Speech synthesis requests.
`services.storage.endpoint`	string	Base endpoint for Storage proxy requests (includes `/storage` prefix).

Error Responses

Status	Cause
`401`	Missing or invalid subscription key.
`405`	Non-GET method used.

Infrastructure

Implemented purely in APIM policy. Enabled for every tenant unconditionally (tenant_info_enabled = true in stacks/apim/locals.tf). No Key Vault grants required.

Source: infra-ai-hub/params/apim/api_policy.xml.tftpl (tenant_info_enabled block).

Integration Tests

Both endpoints have Python integration test coverage in tests/integration/tests/:

Suite	Endpoint	Proxy required
`test_apim_key_rotation.py`	`/internal/apim-keys`	Yes — Key Vault is private-only. Runs via the chisel/privoxy tunnel in CI.
`test_tenant_info.py`	`/internal/tenant-info`	No — response is static, no Key Vault calls. Runs in the direct (no-proxy) CI step.

# Run tenant-info tests locally
cd tests/integration
./run-tests.sh --env test tenant-info

# Run apim-keys tests (requires az login + hub Key Vault access)
./run-tests.sh --env test --group proxy apim-key-rotation