AI Services Hub
Azure Landing Zone Infrastructure

Available AI Services

This page describes the services that the platform makes available to tenant teams, what a tenant actually receives when it is onboarded, and how access is controlled. It is intended to answer the practical question, “What can my team use here?”

All services run behind private endpoints, not on the public internet. Tenant access goes through the platform gateway by using the team's subscription key. See the internal gateway endpoints reference for the gateway-specific details.

Quick Navigation

Model Providers

OpenAI baseline plus tenant-specific Cohere and Mistral

Cognitive Services

Document Intelligence, Language PII, Speech

What Each Tenant Gets

Shared vs. dedicated resource breakdown

Model Providers

Each tenant accesses model deployments through its dedicated AI Foundry Project endpoint, routed via APIM. OpenAI is the baseline model family for tenants across environments. Additional provider families such as Cohere and Mistral are enabled only when they are explicitly deployed for a tenant.

OpenAI Baseline Models

Chat & Reasoning Models

Model Kind Best For Quota (Canada East)
gpt-4.1 Chat Complex reasoning, long context 30,000 TPM
gpt-4.1-mini Chat Fast, cost-efficient tasks 150,000 TPM
gpt-4.1-nano Chat High-throughput simple tasks 150,000 TPM
gpt-4o Chat Multimodal inputs (text + images) 30,000 TPM
gpt-4o-mini Chat Fast multimodal tasks 150,000 TPM
gpt-5-mini Chat Next-gen compact model 10,000 TPM
gpt-5-nano Chat Next-gen ultra-fast tasks 150,000 TPM
gpt-5.1-chat Chat Preview Next-gen preview chat model 5,000 TPM
o1 Reasoning Step-by-step scientific / math problems 5,000 TPM
o3-mini Reasoning Cost-efficient multi-step reasoning 5,000 TPM
o4-mini Reasoning Latest compact reasoning model 10,000 TPM
gpt-5.1-codex-mini Code Code generation & completion 10,000 TPM

Embedding Models

Embedding models convert text into dense vector representations — essential for search, RAG (Retrieval Augmented Generation), and semantic similarity tasks.

Model Dimensions Best For Quota (Canada East)
text-embedding-3-large 3,072 Highest accuracy RAG retrieval 10,000 TPM
text-embedding-3-small 1,536 Fast, cost-efficient semantic search 10,000 TPM
text-embedding-ada-002 1,536 Legacy compatibility 10,000 TPM
💡
TPM = Tokens Per Minute. OpenAI quotas shown above are subscription-wide limits; each tenant is allocated a share. Current test/dev allocation is typically 1% per tenant, but provider-specific deployments can use different quotas. See model-deployments.md for the current allocation source of truth.

Tenant-Specific Provider Additions

The current test deployment for ai-hub-admin includes additional non-OpenAI provider families. These are not currently universal tenant defaults and should be treated as explicitly assigned deployments.

Provider Currently Documented Deployments Current Scope Notes
Cohere cohere-command-a, Cohere-rerank-v4.0-pro, Cohere-rerank-v4.0-fast ai-hub-admin only test Configured in tenant IaC and deployed through the Foundry stack. Several other Cohere catalog models were evaluated but are not currently available in BC Gov Private Marketplace.
Mistral AI Mistral-Large-3, mistral-document-ai-2505, mistral-document-ai-2512 ai-hub-admin only test Chat traffic uses the OpenAI-compatible route /openai/v1/chat/completions. Document AI uses /providers/mistral/azure/ocr. The legacy non-OpenAI Mistral chat route is intentionally rejected by APIM.

Cognitive Services

Beyond the model providers listed above, each tenant can access the following Azure AI capabilities through the hub.

Document Intelligence

Extract structured data from unstructured documents — forms, invoices, PDFs, scanned images. Supports custom models trained on your document types.

Dedicated instance per tenant — View setup guide →

Language Service / PII Detection

Detect and redact Personally Identifiable Information (PII) from text using Microsoft's pre-trained ML models. Supports named entity recognition, sentiment analysis, and key phrase extraction.

Shared hub service — View PII guide →

Speech Services

Convert speech to text and text to speech. Supports real-time transcription, batch audio processing, and custom voice models. Optimized for Canadian English and French.

Dedicated instance per tenant — See FAQ for details

Managed vector + full-text search service for building RAG pipelines, semantic search over documents, and hybrid retrieval. Integrates directly with AI Foundry for grounding model responses.

Dedicated instance per tenant

What Each Tenant Gets

The hub operates on a shared platform, dedicated data model. Expensive control-plane infrastructure is shared; all data-plane services are isolated per tenant.

Shared Platform Infrastructure

Provisioned once, used by all tenants. Cost is split proportionally.

  • AI Foundry Hub — shared model registry & endpoint
  • API Management (APIM) — unified API gateway
  • Application Gateway + WAF — TLS termination, routing
  • Virtual Network & Private DNS — private connectivity
  • Azure Container Registry — shared container images
  • Log Analytics Workspace — centralized monitoring

Dedicated Per-Tenant Resources

Provisioned exclusively for your team. Cost is directly attributed to you.

  • AI Foundry Project — your own API endpoint & prompt flows
  • Model Deployments — OpenAI baseline plus any tenant-specific Cohere or Mistral deployments assigned to your project
  • Document Intelligence — your own instance & models
  • AI Search — your own indexes for RAG
  • Speech Services — your own instance
  • Key Vault — your secrets, isolated from other tenants
  • Storage Account — your documents & data
  • Cosmos DB — your NoSQL database (when enabled)
  • Resource Grouprg-{tenant}-{env}
  • APIM Subscription Key — your API credential
Cross-tenant isolation is enforced at the network and identity layer. Dedicated Key Vaults and Resource Groups mean one tenant cannot access another tenant's data — even if they share APIM or AI Foundry Hub infrastructure. See ADR-010: Multi-Tenant Isolation Model for the full security rationale.

Accessing Your API Credentials

Approved tenant administrators can view and copy their APIM subscription keys for each environment directly from the portal — no platform team involvement required.

Credentials Panel

Navigate to your tenant's detail page in the Tenant Onboarding Portal. If your account is listed as a tenant admin and the tenant is approved, a Credentials panel appears below the configuration summary.

What You Can See

Security Notes

APIM Endpoints

Full API reference for tenant-info and apim-keys endpoints.

View Reference →

Key Rotation

How your APIM subscription key is rotated automatically.

View Guide →

Cost Tracking

How costs are allocated and attributed per tenant.

View Costs →