Developer Documentation
Integration Guide
Everything you need to submit documents to DocuHelix via the API.
Overview
The DocuHelix API allows external systems to submit documents for automated ingestion, classification, and storage within your organization's DocuHelix tenant.
How it works
- Authenticate— Exchange your API client credentials for a short-lived JWT access token.
- Submit— Upload a document file along with optional routing hints and metadata.
- Receive confirmation — Get an immediate
receivedacknowledgment with a tracking UID. - Asynchronous processing— DocuHelix processes the document in the background:
- Intake— File is staged and queued for downstream processing.
- Classification— The system determines the document type, industry, and target cabinet.
- Promotion— The document is stored in the appropriate cabinet and becomes available in the DocuHelix app.
Important
Document processing is fully asynchronous. The API returns immediately after accepting the file. Processing status can be tracked through the DocuHelix app.
Authentication
Obtain an access token by posting your API client credentials to the token endpoint.
Endpoint
POST /api/v1/auth/tokenHeaders
| Header | Required | Description |
|---|---|---|
Content-Type | Yes | application/json |
X-Tenant-ID | Yes | Your organization UID |
Request body
{
"client_id": "your-client-id",
"client_secret": "your-client-secret"
}Response (200 OK)
{
"access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"token_type": "bearer",
"expires_in": 1800
}| Field | Description |
|---|---|
access_token | JWT token to use in subsequent requests |
token_type | Always "bearer" |
expires_in | Token lifetime in seconds (default: 1800 = 30 minutes) |
Example
curl -X POST https://api.docuhelix.com/api/v1/auth/token \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: 01J8KP4QXRN3MV5YZWT6H2ABCD" \
-d '{
"client_id": "01J9ABC123DEF456GHI789JKLM",
"client_secret": "sk_live_a1b2c3d4e5f6g7h8i9j0..."
}'Error responses
| Status | Code | Description |
|---|---|---|
400 | missing_tenant | X-Tenant-ID header is missing |
401 | invalid_credentials | Client ID or secret is wrong |
403 | client_disabled | API client has been deactivated |
429 | — | Rate limit exceeded (10 requests/minute) |
Request Headers
All authenticated requests require two headers:
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer <access_token> |
X-Tenant-ID | Yes | Your organization UID (must match the token's org claim) |
The X-Tenant-IDvalue is validated against the JWT token. If they don't match, the request is rejected with a 403 tenant_mismatch error. This prevents cross-tenant access.
Document Ingestion
Submit a document for processing.
Endpoint
POST /api/v1/ingestion/documentsHeaders
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer <access_token> |
X-Tenant-ID | Yes | Your organization UID |
Idempotency-Key | No | Unique key to prevent duplicate submissions |
Request body (multipart/form-data)
Required fields:
| Field | Type | Description |
|---|---|---|
file | file | The document file (max 500 MB) |
Optional fields:
| Field | Type | Max Length | Description |
|---|---|---|---|
filename | string | 255 | Override the original filename |
industry_key | string | 80 | Industry identifier (e.g. "mortgage", "healthcare") |
industry_uid | string | 26 | Industry ULID (validated against your tenant config) |
cabinet_uid | string | 26 | Target cabinet ULID |
cabinet_name | string | 255 | Target cabinet name (resolved to UID if cabinet_uid not provided) |
module_key | string | 80 | Metadata module key |
external_source | string | 200 | Name of the sending system (e.g. "salesforce", "workday") |
external_reference_id | string | 200 | Your system's unique ID for this document |
external_entity_type | string | 100 | Entity type in your system (e.g. "Employee", "Loan") |
external_entity_id | string | 200 | Entity ID in your system |
raw_document_type | string | 100 | Your classification label (e.g. "W-2", "Invoice") |
metadata | JSON string | — | Custom key-value metadata |
notes | string | 5000 | Internal notes |
Supported file types
| Category | Types |
|---|---|
Documents | PDF, Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx) |
Text | Plain text, CSV |
Images | PNG, JPEG, TIFF |
Example
curl -X POST https://api.docuhelix.com/api/v1/ingestion/documents \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
-H "X-Tenant-ID: 01J8KP4QXRN3MV5YZWT6H2ABCD" \
-F "file=@/path/to/document.pdf" \
-F "industry_key=mortgage" \
-F "cabinet_uid=01JABC123CABINET456UIDXYZ" \
-F "external_source=loan-origination-system" \
-F "external_reference_id=LOAN-2026-04-00183" \
-F "external_entity_type=Loan" \
-F "external_entity_id=loan-78432" \
-F "raw_document_type=Closing Disclosure" \
-F 'metadata={"loan_number":"2026-04-00183","borrower":"Jane Doe"}'Response (201 Created)
{
"data": {
"uid": "01JNQR7V3KXMSW8YBT4FH6PZEA",
"org_uid": "01J8KP4QXRN3MV5YZWT6H2ABCD",
"api_client_uid": "01J9ABC123DEF456GHI789JKLM",
"correlation_uid": "01JNQR7V3KXMSW8YBT4FH6PZEB",
"status": "processing",
"content_sha256": "a3f2b8c1d4e5f6a7b8c9...f0a1",
"original_filename": "closing-disclosure.pdf",
"external_source": "loan-origination-system",
"external_reference_id": "LOAN-2026-04-00183",
"external_entity_type": "Loan",
"external_entity_id": "loan-78432",
"raw_document_type": "Closing Disclosure",
"industry_key": "mortgage",
"industry_uid": null,
"cabinet_uid": "01JABC123CABINET456UIDXYZ",
"module_key": null,
"received_at": "2026-04-03T21:15:00.000000Z",
"processed_at": null,
"created_at": "2026-04-03T21:15:00.000000Z",
"updated_at": "2026-04-03T21:15:00.000000Z"
}
}Store the uid value. This is the ingestion request tracking identifier.
Routing Behavior
How DocuHelix routes your document depends on how much context you provide.
Full routing (fastest path)
When you provide industry_key (or industry_uid) + cabinet_uid + module_key:
- All values are validated against your tenant configuration.
- If valid, the document is promoted directly into the target cabinet with the specified metadata schema.
- No classification step is needed.
Partial routing
When you provide some routing fields but not all (e.g. cabinet_uid only):
- Provided fields are validated.
- The system resolves the missing fields using classification.
No routing
When you provide only the file and no routing fields:
- The system runs full classification: industry detection, cabinet assignment, and document type identification.
- This path takes longer but requires no knowledge of your tenant's configuration.
Low confidence
When classification confidence is below the configured threshold:
- The document enters intake (a review queue).
- A team member reviews and manually assigns the correct routing in the DocuHelix app.
- No document is lost — it simply waits for human confirmation.
Recommendation: Provide as much routing information as possible. Full routing skips classification entirely and gives the fastest, most predictable results.
Response Behavior
The API is asynchronous. When you submit a document, the response confirms that DocuHelix has accepted and queued it — not that processing is complete.
Status lifecycle
| Status | Meaning |
|---|---|
received | File accepted by the API |
processing | File staged and queued for classification |
processed | Handed off to the classification pipeline |
rejected | Rejected (e.g. plan limits exceeded) |
failed | An error occurred during processing |
The initial response typically returns processing. The document moves through the remaining statuses asynchronously.
Error Handling
All error responses follow the same envelope:
{
"error": {
"code": "error_code",
"message": "Human-readable description",
"details": {}
}
}Status codes
| Status | Code | Description |
|---|---|---|
400 | missing_tenant | X-Tenant-ID header is missing or empty |
400 | validation_error | Request body failed validation (see details for field errors) |
401 | invalid_credentials | Wrong client_id or client_secret |
401 | missing_token | No Authorization: Bearer header |
401 | invalid_token | JWT is expired, malformed, or has an invalid signature |
403 | client_disabled | API client has been deactivated by an admin |
403 | insufficient_scope | Token lacks the documents:ingest scope |
403 | tenant_mismatch | X-Tenant-ID doesn't match the token's organization |
422 | validation_error | Routing validation failed (invalid industry, cabinet, or module) |
429 | — | Rate limit exceeded |
500 | server_error | Unexpected internal error |
Validation error example (422)
{
"error": {
"code": "validation_error",
"message": "Validation error",
"details": {
"cabinet_uid": ["Cabinet not found in this organization."],
"module_key": ["Module key not found for this organization."]
}
}
}Best Practices
Provide routing when possible
Full routing (industry_key + cabinet_uid + module_key) skips classification, reduces processing time, and guarantees deterministic placement.
Use idempotency keys
Include an Idempotency-Key header to prevent duplicate documents when retrying failed requests. If a request with the same key was already processed successfully, the API returns the existing result instead of creating a duplicate.
curl -X POST https://api.docuhelix.com/api/v1/ingestion/documents \
-H "Authorization: Bearer ..." \
-H "X-Tenant-ID: ..." \
-H "Idempotency-Key: import-batch-2026-04-03-doc-00183" \
-F "file=@document.pdf"Store the external reference
Always include external_reference_id and external_source. This creates a traceable link between DocuHelix and your source system, making it easy to reconcile records and debug issues.
Retry safely
If a request fails with a 5xx error or a network timeout:
- Retry with the same
Idempotency-Key. - Use exponential backoff (1s, 2s, 4s, 8s...).
- Cap retries at 5 attempts.
Do not retry 4xx errors — fix the request first.
Cache your access token
Tokens are valid for 30 minutes. Request a new token only when the current one expires. Do not request a new token for every API call.
Respect rate limits
| Endpoint | Limit |
|---|---|
Token issuance | 10 requests/minute |
Document ingestion | 60 requests/minute |
When rate-limited (429), wait and retry with backoff.
Example Use Cases
HR system — no routing
An HRIS exports employee documents nightly. The system doesn't know about DocuHelix's cabinet structure.
curl -X POST https://api.docuhelix.com/api/v1/ingestion/documents \
-H "Authorization: Bearer $TOKEN" \
-H "X-Tenant-ID: $ORG_UID" \
-H "Idempotency-Key: workday-export-2026-04-03-emp-4821" \
-F "file=@w2-form.pdf" \
-F "external_source=workday" \
-F "external_reference_id=EMP-4821-W2-2025" \
-F "external_entity_type=Employee" \
-F "external_entity_id=emp-4821" \
-F "raw_document_type=W-2"DocuHelix classifies the document, determines it belongs to the HR / Tax Documents cabinet, and files it automatically. If confidence is low, it goes to intake for review.
Mortgage system — full routing
A loan origination system knows exactly where each document belongs.
curl -X POST https://api.docuhelix.com/api/v1/ingestion/documents \
-H "Authorization: Bearer $TOKEN" \
-H "X-Tenant-ID: $ORG_UID" \
-H "Idempotency-Key: los-2026-04-03-loan-78432-cd" \
-F "file=@closing-disclosure.pdf" \
-F "industry_key=mortgage" \
-F "cabinet_uid=01JABC123CABINET456UIDXYZ" \
-F "module_key=closing-documents" \
-F "external_source=loan-origination-system" \
-F "external_reference_id=LOAN-2026-04-00183" \
-F "external_entity_type=Loan" \
-F "external_entity_id=loan-78432" \
-F 'metadata={"loan_number":"2026-04-00183","borrower":"Jane Doe","property_address":"123 Main St"}'The document bypasses classification entirely and is promoted directly into the specified cabinet.
Accounting system — partial routing
An accounting system knows the cabinet but not the metadata module.
curl -X POST https://api.docuhelix.com/api/v1/ingestion/documents \
-H "Authorization: Bearer $TOKEN" \
-H "X-Tenant-ID: $ORG_UID" \
-H "Idempotency-Key: quickbooks-2026-04-03-inv-9912" \
-F "file=@invoice-9912.pdf" \
-F "cabinet_uid=01JXYZ789FINANCE000CABINET" \
-F "external_source=quickbooks" \
-F "external_reference_id=INV-9912" \
-F "external_entity_type=Vendor" \
-F "external_entity_id=vendor-331" \
-F "raw_document_type=Invoice"The document is routed to the specified cabinet. DocuHelix classifies the document type and assigns the appropriate metadata module automatically.
Quick Reference
| Item | Value |
|---|---|
Base URL | https://api.docuhelix.com |
Auth endpoint | POST /api/v1/auth/token |
Ingestion endpoint | POST /api/v1/ingestion/documents |
Token lifetime | 30 minutes |
Max file size | 500 MB |
Auth rate limit | 10 req/min |
Ingestion rate limit | 60 req/min |
Tenant header | X-Tenant-ID |
