Integrations Guide
This guide covers how to integrate with the Document Intelligence Platform programmatically — sending documents in for processing and retrieving the extracted results.
All API endpoints in this guide are relative to your backend service URL (e.g.
https://your-host:3002). See the API Reference for the full OpenAPI specification.
Authentication
The platform supports two authentication methods. Use API Keys for service-to-service integrations and automated pipelines, or Bearer Tokens for user-facing applications that go through the Keycloak SSO login flow.
API Key Authentication
Best for automated pipelines, scripts, and service-to-service calls. Pass your key in the X-API-Key header.
Generate a key by logging into the platform with your IDIR and navigating to Settings.
Bearer Token (SSO)
Best for user-facing applications. Uses Keycloak OAuth 2.0 Authorization Code flow. The backend handles all interactions with the identity provider.
Tokens are passed in the Authorization: Bearer <token> header.
Generating an API Key
- Log in with your IDIR — Open the Document Intelligence frontend and sign in through the Keycloak SSO login page using your IDIR credentials.
- Navigate to Settings — Click the Settings tab in the left sidebar.
- Generate API Key — Click the Generate API Key button. Your new key will be displayed in a modal dialog.
- Copy and store the key — Copy the key immediately and store it securely. The full key is only shown once at creation time.
Using your API Key in requests
Pass your key in the X-API-Key header on all API calls:
curl /api/documents \ -H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx"
Sending Documents In
Documents are submitted via the upload endpoint as base64-encoded payloads. On upload, the platform stores the file, creates a document record, and automatically starts a processing workflow via Temporal.
Upload Endpoint
POST /api/upload
Accepts a JSON body with the document file encoded as a base64 string. Processing begins immediately after upload — no separate "start" call is needed.
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
title |
string | Yes | Display name for the document |
file |
string | Yes | Base64-encoded file content |
file_type |
enum | Yes | pdf, image, or scan |
model_id |
string | Yes | Azure Document Intelligence model to use (e.g. prebuilt-layout) |
original_filename |
string | No | Original file name for display purposes |
metadata |
object | No | Arbitrary key-value metadata to attach to the document |
workflow_config_id |
string | No | ID of a workflow configuration to use for processing. If omitted, the default pipeline runs. |
Upload a PDF document (curl example)
# Encode a file to base64
FILE_B64=$(base64 -w 0 invoice.pdf)
# Upload the document
curl -X POST /api/upload \
-H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d "{
\"title\": \"Invoice Q1-2026\",
\"file\": \"$FILE_B64\",
\"file_type\": \"pdf\",
\"model_id\": \"prebuilt-layout\",
\"original_filename\": \"invoice.pdf\",
\"metadata\": { \"department\": \"finance\", \"quarter\": \"Q1\" }
}"
Response:
{
"success": true,
"document": {
"id": "ef2dd8b2-cbed-4b30-a958-3c2484479ca4",
"title": "Invoice Q1-2026",
"original_filename": "invoice.pdf",
"file_type": "pdf",
"file_size": 245782,
"status": "ongoing_ocr",
"created_at": "2026-02-19T10:30:00.000Z"
}
}
Upload an image document (curl example)
FILE_B64=$(base64 -w 0 receipt.jpg)
curl -X POST /api/upload \
-H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d "{
\"title\": \"Expense Receipt\",
\"file\": \"$FILE_B64\",
\"file_type\": \"image\",
\"model_id\": \"prebuilt-layout\"
}"
List available models
Query the available OCR models before uploading:
curl /api/models \ -H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx"
Response:
{
"models": ["prebuilt-layout", "prebuilt-invoice", "custom-model-v2"]
}
Workflow Configuration (Optional)
For advanced processing pipelines, create a workflow configuration using the Workflows API. Workflows are DAG-based graphs with 30+ built-in activities including OCR, validation, classification, HITL routing, and more. Reference the workflow ID when uploading documents to route them through your custom pipeline.
Create a workflow configuration (curl example)
curl -X POST /api/workflows \
-H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d "{
\"name\": \"Invoice Processing\",
\"description\": \"OCR + validation + review pipeline\",
\"config\": {
\"nodes\": [...],
\"edges\": [...]
}
}"
Then reference it during upload:
curl -X POST /api/upload \
-H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d "{
\"title\": \"Invoice\",
\"file\": \"...\",
\"file_type\": \"pdf\",
\"model_id\": \"prebuilt-layout\",
\"workflow_config_id\": \"<workflow-id>\"
}"
Retrieving Processed Data
The platform processes documents asynchronously. After uploading, poll the document status until processing completes, then retrieve the extracted data.
Processing Flow
Documents move through the following statuses:
If the workflow includes a human review step, documents may enter:
The platform does not currently support webhooks. Poll
GET /api/documents or GET /api/documents/:id/ocr to check when processing completes. A reasonable interval is every 5-10 seconds.
Checking Document Status
List all documents to see their current processing status.
List all documents (curl example)
curl /api/documents \ -H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx"
Response (array of documents):
[
{
"id": "ef2dd8b2-cbed-4b30-a958-3c2484479ca4",
"title": "Invoice Q1-2026",
"original_filename": "invoice.pdf",
"file_type": "pdf",
"file_size": 245782,
"status": "completed_ocr",
"created_at": "2026-02-19T10:30:00.000Z",
"updated_at": "2026-02-19T10:31:15.000Z"
}
]
Retrieving OCR Results
Once a document reaches completed_ocr status, retrieve the structured extraction data.
Get OCR results (curl example)
curl /api/documents/ef2dd8b2-cbed-4b30-a958-3c2484479ca4/ocr \ -H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx"
Response:
{
"document_id": "ef2dd8b2-cbed-4b30-a958-3c2484479ca4",
"status": "completed_ocr",
"title": "Invoice Q1-2026",
"original_filename": "invoice.pdf",
"file_type": "pdf",
"file_size": 245782,
"created_at": "2026-02-19T10:30:00.000Z",
"updated_at": "2026-02-19T10:31:15.000Z",
"model_id": "prebuilt-layout",
"ocr_result": {
"id": "result-uuid",
"document_id": "ef2dd8b2-cbed-4b30-a958-3c2484479ca4",
"keyValuePairs": {
"InvoiceNumber": "INV-2026-001",
"Date": "2026-01-15",
"Total": "$1,234.56"
},
"processed_at": "2026-02-19T10:31:15.000Z"
}
}
The keyValuePairs object contains the extracted fields. Its structure depends on the model used — custom models return fields defined during training, while prebuilt models return standard fields for their document type.
Downloading the Original File
Retrieve the original uploaded file as a binary download.
Download original document (curl example)
curl -o invoice.pdf \ /api/documents/ef2dd8b2-cbed-4b30-a958-3c2484479ca4/download \ -H "Authorization: Bearer <your-sso-token>"
Returns the file with appropriate Content-Type and Content-Disposition headers.
Human-in-the-Loop Review
When workflows include confidence-based routing, documents that fall below confidence thresholds are routed to a human review queue. The HITL API enables building custom review interfaces or integrating review into existing tools.
Review Queue
GET /api/hitl/queue
Lists documents awaiting review. Supports filtering by status, priority, and assignment.
Queue Statistics
GET /api/hitl/queue/stats
Returns counts and metrics for the review queue — useful for dashboards and monitoring.
Review Session Lifecycle
Reviews follow a session-based model:
- Start a session —
POST /api/hitl/sessionswith the document ID to begin reviewing - Submit corrections —
POST /api/hitl/sessions/:id/correctionsto fix extracted fields - Complete the review — choose one of:
POST /api/hitl/sessions/:id/submit— approve the documentPOST /api/hitl/sessions/:id/escalate— escalate for expert reviewPOST /api/hitl/sessions/:id/skip— skip and return to queue
Review session workflow (curl examples)
# 1. Get the review queue
curl /api/hitl/queue \
-H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx"
# 2. Start a review session
curl -X POST /api/hitl/sessions \
-H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d "{ \"documentId\": \"ef2dd8b2-...\" }"
# 3. Submit corrections
curl -X POST /api/hitl/sessions/<session-id>/corrections \
-H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d "{
\"corrections\": [
{ \"field\": \"InvoiceNumber\", \"value\": \"INV-2026-002\" }
]
}"
# 4. Approve the document
curl -X POST /api/hitl/sessions/<session-id>/submit \
-H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx"
Analytics
Track review performance and throughput with the analytics endpoint:
Get HITL analytics (curl example)
curl "/api/hitl/analytics?from=2026-01-01&to=2026-02-19" \ -H "X-API-Key: dip_xxxxxxxxxxxxxxxxxxxx"
End-to-End Example
A complete integration walkthrough: authenticate, upload a document, wait for processing, and retrieve the results.
Complete Workflow
# 1. Set your API key
API_KEY="dip_xxxxxxxxxxxxxxxxxxxx"
BASE_URL="https://your-host:3002"
# 2. Check available models
curl -s "$BASE_URL/api/models" \
-H "X-API-Key: $API_KEY" | jq .
# 3. Upload a document
FILE_B64=$(base64 -w 0 document.pdf)
DOC_ID=$(curl -s -X POST "$BASE_URL/api/upload" \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"title\": \"My Document\",
\"file\": \"$FILE_B64\",
\"file_type\": \"pdf\",
\"model_id\": \"prebuilt-layout\"
}" | jq -r '.document.id')
echo "Uploaded document: $DOC_ID"
# 4. Poll for completion
while true; do
STATUS=$(curl -s "$BASE_URL/api/documents/$DOC_ID/ocr" \
-H "X-API-Key: $API_KEY" | jq -r '.status')
echo "Status: $STATUS"
if [ "$STATUS" != "ongoing_ocr" ]; then break; fi
sleep 5
done
# 5. Retrieve OCR results
curl -s "$BASE_URL/api/documents/$DOC_ID/ocr" \
-H "X-API-Key: $API_KEY" | jq '.ocr_result.keyValuePairs'