Available AI Services

This page describes the services that the platform makes available to tenant teams, what a tenant actually receives when it is onboarded, and how access is controlled. It is intended to answer the practical question, “What can my team use here?”

All services run behind private endpoints, not on the public internet. Tenant access goes through the platform gateway by using the team's subscription key. See the internal gateway endpoints reference for the gateway-specific details.

Model Providers

OpenAI baseline plus tenant-specific Cohere and Mistral

Cognitive Services

Document Intelligence, PII detection, Speech

What Each Tenant Gets

Shared vs. dedicated resource breakdown

Model Providers

Each tenant accesses model deployments through its dedicated AI Foundry Project endpoint, routed via APIM. OpenAI is the baseline model family for tenants across environments. Additional provider families such as Cohere and Mistral are enabled only when they are explicitly deployed for a tenant.

OpenAI Baseline Models

Chat & Reasoning Models

Model	Kind	Best For	Quota (Canada East)
gpt-4.1	Chat	Complex reasoning, long context	30,000 TPM
gpt-4.1-mini	Chat	Fast, cost-efficient tasks	150,000 TPM
gpt-4.1-nano	Chat	High-throughput simple tasks	150,000 TPM
gpt-4o	Chat	Multimodal inputs (text + images)	30,000 TPM
gpt-4o-mini	Chat	Fast multimodal tasks	150,000 TPM
gpt-5-mini	Chat	Next-gen compact model	10,000 TPM
gpt-5-nano	Chat	Next-gen ultra-fast tasks	150,000 TPM
gpt-5.1	Provisioned Chat	Provisioned GPT-5.1 base model for sustained tenant workloads	4,750 input TPM/PTU; output weighted 8x
gpt-5.1-chat	Chat Preview	Next-gen preview chat model	5,000 TPM
o1	Reasoning	Step-by-step scientific / math problems	5,000 TPM
o3-mini	Reasoning	Cost-efficient multi-step reasoning	5,000 TPM
o4-mini	Reasoning	Latest compact reasoning model	10,000 TPM
gpt-5.1-codex-mini	Code	Code generation & completion	10,000 TPM

Embedding Models

Embedding models convert text into dense vector representations — essential for search, RAG (Retrieval Augmented Generation), and semantic similarity tasks.

Model	Dimensions	Best For	Quota (Canada East)
text-embedding-3-large	3,072	Highest accuracy RAG retrieval	10,000 TPM
text-embedding-3-small	1,536	Fast, cost-efficient semantic search	10,000 TPM
text-embedding-ada-002	1,536	Legacy compatibility	10,000 TPM

💡

TPM = Tokens Per Minute. OpenAI quotas shown above are subscription-wide limits; each tenant is allocated a share. Current test/dev allocation is typically 1% per tenant, but provider-specific deployments can use different quotas. See model-deployments.md for the current allocation source of truth.

💡

Provisioned throughput uses input-equivalent TPM, not a simple raw input + output token total. For GPT-5.1, APIM now enforces non-streaming requests on a dedicated PTU backend using actual response usage with prompt tokens weighted x1 and completion tokens weighted x8. The lower raw-token cap is retained only as a fallback for streaming requests, where APIM cannot reliably read final SSE usage before returning the stream.

Tenant-Specific Provider Additions

The current test deployment for ai-hub-admin includes additional non-OpenAI provider families. These are not currently universal tenant defaults and should be treated as explicitly assigned deployments.

Provider	Currently Documented Deployments	Current Scope	Notes
Cohere	`cohere-command-a`, `Cohere-rerank-v4.0-pro`, `Cohere-rerank-v4.0-fast`	ai-hub-admin only test	Configured in tenant IaC and deployed through the Foundry stack. Several other Cohere catalog models were evaluated but are not currently available in BC Gov Private Marketplace.
Mistral AI	`Mistral-Large-3`, `mistral-document-ai-2505`, `mistral-document-ai-2512`	ai-hub-admin only test	Chat traffic uses the OpenAI-compatible route `/openai/v1/chat/completions`. Document AI uses `/providers/mistral/azure/ocr`. The legacy non-OpenAI Mistral chat route is intentionally rejected by APIM.

Cognitive Services

Beyond the model providers listed above, each tenant can access the following Azure AI capabilities through the hub.

Platform rule: Azure AI Language is reserved for PII detection in this hub. If a tenant needs summarization, sentiment, classification, question answering, or orchestration behavior, implement that workload on tenant Azure AI Foundry model deployments instead of the shared Language service.

Document Intelligence

Extract structured data from unstructured documents — forms, invoices, PDFs, scanned images. Supports custom models trained on your document types.

Dedicated instance per tenant — View setup guide →

Azure AI Language / PII Detection

Detect and redact Personally Identifiable Information (PII) from text using Microsoft's pre-trained PII models. In this platform, the shared Language integration is scoped to PII detection only.

Shared hub service — View PII guide →

Speech Services

Convert speech to text and text to speech. Supports real-time transcription, batch audio processing, and custom voice models. Optimized for Canadian English and French.

Dedicated instance per tenant — See FAQ for details

AI Search

Managed vector + full-text search service for building RAG pipelines, semantic search over documents, and hybrid retrieval. Integrates directly with AI Foundry for grounding model responses.

Dedicated instance per tenant

What Each Tenant Gets

The hub operates on a shared platform, dedicated data model. Expensive control-plane infrastructure is shared; all data-plane services are isolated per tenant.

Shared Platform Infrastructure

Provisioned once, used by all tenants. Cost is split proportionally.

AI Foundry Hub — shared model registry & endpoint
API Management (APIM) — unified API gateway
Application Gateway + WAF — TLS termination, routing
Virtual Network & Private DNS — private connectivity
Azure Container Registry — shared container images
Log Analytics Workspace — centralized monitoring

Dedicated Per-Tenant Resources

Provisioned exclusively for your team. Cost is directly attributed to you.

AI Foundry Project — your own API endpoint & prompt flows
Model Deployments — OpenAI baseline plus any tenant-specific Cohere or Mistral deployments assigned to your project
Document Intelligence — your own instance & models
AI Search — your own indexes for RAG
Speech Services — your own instance
Key Vault — your secrets, isolated from other tenants
Storage Account — your documents & data
Cosmos DB — your NoSQL database (when enabled)
Resource Group — rg-{tenant}-{env}
APIM Subscription Key — your API credential

Cross-tenant isolation is enforced at the network and identity layer. Dedicated Key Vaults and Resource Groups mean one tenant cannot access another tenant's data — even if they share APIM or AI Foundry Hub infrastructure. See ADR-010: Multi-Tenant Isolation Model for the full security rationale.

Accessing Your API Credentials

Approved tenant administrators can view and copy their APIM subscription keys for each environment directly from the portal — no platform team involvement required.

Credentials Panel

Navigate to your tenant's detail page in the Tenant Onboarding Portal. If your account is listed as a tenant admin and the tenant is approved, a Credentials panel appears below the configuration summary.

What You Can See

Three environment tabs (dev / test / prod) — credentials load on demand when you select a tab.
Primary and Secondary subscription keys — values are always masked. Use the copy button to place a key on your clipboard.
Rotation metadata (when rotation is enabled) — shows last rotation date, next scheduled rotation, and which slot is currently safe to use.
Tenant info panel (expandable) — lists your base APIM URL, enabled services, and deployed model names and capacities.

Security Notes

Key values are never displayed as text in the page. Only copy-to-clipboard is supported.
Access is restricted to users listed in your tenant's admin_users configuration.
Each environment's credentials are fetched independently from the hub Key Vault using the portal's Managed Identity.

APIM Endpoints

Full API reference for tenant-info and apim-keys endpoints.

View Reference →

Key Rotation

How your APIM subscription key is rotated automatically.

View Guide →

Cost Tracking

How costs are allocated and attributed per tenant.

View Costs →