DocuHelix

Developer Documentation

Integration Guide

Everything you need to submit documents to DocuHelix via the API.

Base URL: https://api.docuhelix.comVersion: v1

Overview

The DocuHelix API allows external systems to submit documents for automated ingestion, classification, and storage within your organization's DocuHelix tenant.

How it works

  1. Authenticate— Exchange your API client credentials for a short-lived JWT access token.
  2. Submit— Upload a document file along with optional routing hints and metadata.
  3. Receive confirmation — Get an immediate received acknowledgment with a tracking UID.
  4. Asynchronous processing— DocuHelix processes the document in the background:
    • Intake— File is staged and queued for downstream processing.
    • Classification— The system determines the document type, industry, and target cabinet.
    • Promotion— The document is stored in the appropriate cabinet and becomes available in the DocuHelix app.

Important

Document processing is fully asynchronous. The API returns immediately after accepting the file. Processing status can be tracked through the DocuHelix app.

Authentication

Obtain an access token by posting your API client credentials to the token endpoint.

Endpoint

http
POST /api/v1/auth/token

Headers

HeaderRequiredDescription
Content-TypeYesapplication/json
X-Tenant-IDYesYour organization UID

Request body

json
{
  "client_id": "your-client-id",
  "client_secret": "your-client-secret"
}

Response (200 OK)

json
{
  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "bearer",
  "expires_in": 1800
}
FieldDescription
access_tokenJWT token to use in subsequent requests
token_typeAlways "bearer"
expires_inToken lifetime in seconds (default: 1800 = 30 minutes)

Example

bash
curl -X POST https://api.docuhelix.com/api/v1/auth/token \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: 01J8KP4QXRN3MV5YZWT6H2ABCD" \
  -d '{
    "client_id": "01J9ABC123DEF456GHI789JKLM",
    "client_secret": "sk_live_a1b2c3d4e5f6g7h8i9j0..."
  }'

Error responses

StatusCodeDescription
400missing_tenantX-Tenant-ID header is missing
401invalid_credentialsClient ID or secret is wrong
403client_disabledAPI client has been deactivated
429Rate limit exceeded (10 requests/minute)

Request Headers

All authenticated requests require two headers:

HeaderRequiredDescription
AuthorizationYesBearer <access_token>
X-Tenant-IDYesYour organization UID (must match the token's org claim)

The X-Tenant-IDvalue is validated against the JWT token. If they don't match, the request is rejected with a 403 tenant_mismatch error. This prevents cross-tenant access.

Document Ingestion

Submit a document for processing.

Endpoint

http
POST /api/v1/ingestion/documents

Headers

HeaderRequiredDescription
AuthorizationYesBearer <access_token>
X-Tenant-IDYesYour organization UID
Idempotency-KeyNoUnique key to prevent duplicate submissions

Request body (multipart/form-data)

Required fields:

FieldTypeDescription
filefileThe document file (max 500 MB)

Optional fields:

FieldTypeMax LengthDescription
filenamestring255Override the original filename
industry_keystring80Industry identifier (e.g. "mortgage", "healthcare")
industry_uidstring26Industry ULID (validated against your tenant config)
cabinet_uidstring26Target cabinet ULID
cabinet_namestring255Target cabinet name (resolved to UID if cabinet_uid not provided)
module_keystring80Metadata module key
external_sourcestring200Name of the sending system (e.g. "salesforce", "workday")
external_reference_idstring200Your system's unique ID for this document
external_entity_typestring100Entity type in your system (e.g. "Employee", "Loan")
external_entity_idstring200Entity ID in your system
raw_document_typestring100Your classification label (e.g. "W-2", "Invoice")
metadataJSON stringCustom key-value metadata
notesstring5000Internal notes

Supported file types

CategoryTypes
DocumentsPDF, Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx)
TextPlain text, CSV
ImagesPNG, JPEG, TIFF

Example

bash
curl -X POST https://api.docuhelix.com/api/v1/ingestion/documents \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
  -H "X-Tenant-ID: 01J8KP4QXRN3MV5YZWT6H2ABCD" \
  -F "file=@/path/to/document.pdf" \
  -F "industry_key=mortgage" \
  -F "cabinet_uid=01JABC123CABINET456UIDXYZ" \
  -F "external_source=loan-origination-system" \
  -F "external_reference_id=LOAN-2026-04-00183" \
  -F "external_entity_type=Loan" \
  -F "external_entity_id=loan-78432" \
  -F "raw_document_type=Closing Disclosure" \
  -F 'metadata={"loan_number":"2026-04-00183","borrower":"Jane Doe"}'

Response (201 Created)

json
{
  "data": {
    "uid": "01JNQR7V3KXMSW8YBT4FH6PZEA",
    "org_uid": "01J8KP4QXRN3MV5YZWT6H2ABCD",
    "api_client_uid": "01J9ABC123DEF456GHI789JKLM",
    "correlation_uid": "01JNQR7V3KXMSW8YBT4FH6PZEB",
    "status": "processing",
    "content_sha256": "a3f2b8c1d4e5f6a7b8c9...f0a1",
    "original_filename": "closing-disclosure.pdf",
    "external_source": "loan-origination-system",
    "external_reference_id": "LOAN-2026-04-00183",
    "external_entity_type": "Loan",
    "external_entity_id": "loan-78432",
    "raw_document_type": "Closing Disclosure",
    "industry_key": "mortgage",
    "industry_uid": null,
    "cabinet_uid": "01JABC123CABINET456UIDXYZ",
    "module_key": null,
    "received_at": "2026-04-03T21:15:00.000000Z",
    "processed_at": null,
    "created_at": "2026-04-03T21:15:00.000000Z",
    "updated_at": "2026-04-03T21:15:00.000000Z"
  }
}

Store the uid value. This is the ingestion request tracking identifier.

Routing Behavior

How DocuHelix routes your document depends on how much context you provide.

Full routing (fastest path)

When you provide industry_key (or industry_uid) + cabinet_uid + module_key:

  • All values are validated against your tenant configuration.
  • If valid, the document is promoted directly into the target cabinet with the specified metadata schema.
  • No classification step is needed.

Partial routing

When you provide some routing fields but not all (e.g. cabinet_uid only):

  • Provided fields are validated.
  • The system resolves the missing fields using classification.

No routing

When you provide only the file and no routing fields:

  • The system runs full classification: industry detection, cabinet assignment, and document type identification.
  • This path takes longer but requires no knowledge of your tenant's configuration.

Low confidence

When classification confidence is below the configured threshold:

  • The document enters intake (a review queue).
  • A team member reviews and manually assigns the correct routing in the DocuHelix app.
  • No document is lost — it simply waits for human confirmation.

Recommendation: Provide as much routing information as possible. Full routing skips classification entirely and gives the fastest, most predictable results.

Response Behavior

The API is asynchronous. When you submit a document, the response confirms that DocuHelix has accepted and queued it — not that processing is complete.

Status lifecycle

StatusMeaning
receivedFile accepted by the API
processingFile staged and queued for classification
processedHanded off to the classification pipeline
rejectedRejected (e.g. plan limits exceeded)
failedAn error occurred during processing

The initial response typically returns processing. The document moves through the remaining statuses asynchronously.

Error Handling

All error responses follow the same envelope:

json
{
  "error": {
    "code": "error_code",
    "message": "Human-readable description",
    "details": {}
  }
}

Status codes

StatusCodeDescription
400missing_tenantX-Tenant-ID header is missing or empty
400validation_errorRequest body failed validation (see details for field errors)
401invalid_credentialsWrong client_id or client_secret
401missing_tokenNo Authorization: Bearer header
401invalid_tokenJWT is expired, malformed, or has an invalid signature
403client_disabledAPI client has been deactivated by an admin
403insufficient_scopeToken lacks the documents:ingest scope
403tenant_mismatchX-Tenant-ID doesn't match the token's organization
422validation_errorRouting validation failed (invalid industry, cabinet, or module)
429Rate limit exceeded
500server_errorUnexpected internal error

Validation error example (422)

json
{
  "error": {
    "code": "validation_error",
    "message": "Validation error",
    "details": {
      "cabinet_uid": ["Cabinet not found in this organization."],
      "module_key": ["Module key not found for this organization."]
    }
  }
}

Best Practices

Provide routing when possible

Full routing (industry_key + cabinet_uid + module_key) skips classification, reduces processing time, and guarantees deterministic placement.

Use idempotency keys

Include an Idempotency-Key header to prevent duplicate documents when retrying failed requests. If a request with the same key was already processed successfully, the API returns the existing result instead of creating a duplicate.

bash
curl -X POST https://api.docuhelix.com/api/v1/ingestion/documents \
  -H "Authorization: Bearer ..." \
  -H "X-Tenant-ID: ..." \
  -H "Idempotency-Key: import-batch-2026-04-03-doc-00183" \
  -F "file=@document.pdf"

Store the external reference

Always include external_reference_id and external_source. This creates a traceable link between DocuHelix and your source system, making it easy to reconcile records and debug issues.

Retry safely

If a request fails with a 5xx error or a network timeout:

  1. Retry with the same Idempotency-Key.
  2. Use exponential backoff (1s, 2s, 4s, 8s...).
  3. Cap retries at 5 attempts.

Do not retry 4xx errors — fix the request first.

Cache your access token

Tokens are valid for 30 minutes. Request a new token only when the current one expires. Do not request a new token for every API call.

Respect rate limits

EndpointLimit
Token issuance10 requests/minute
Document ingestion60 requests/minute

When rate-limited (429), wait and retry with backoff.

Example Use Cases

HR system — no routing

An HRIS exports employee documents nightly. The system doesn't know about DocuHelix's cabinet structure.

bash
curl -X POST https://api.docuhelix.com/api/v1/ingestion/documents \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Tenant-ID: $ORG_UID" \
  -H "Idempotency-Key: workday-export-2026-04-03-emp-4821" \
  -F "file=@w2-form.pdf" \
  -F "external_source=workday" \
  -F "external_reference_id=EMP-4821-W2-2025" \
  -F "external_entity_type=Employee" \
  -F "external_entity_id=emp-4821" \
  -F "raw_document_type=W-2"

DocuHelix classifies the document, determines it belongs to the HR / Tax Documents cabinet, and files it automatically. If confidence is low, it goes to intake for review.

Mortgage system — full routing

A loan origination system knows exactly where each document belongs.

bash
curl -X POST https://api.docuhelix.com/api/v1/ingestion/documents \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Tenant-ID: $ORG_UID" \
  -H "Idempotency-Key: los-2026-04-03-loan-78432-cd" \
  -F "file=@closing-disclosure.pdf" \
  -F "industry_key=mortgage" \
  -F "cabinet_uid=01JABC123CABINET456UIDXYZ" \
  -F "module_key=closing-documents" \
  -F "external_source=loan-origination-system" \
  -F "external_reference_id=LOAN-2026-04-00183" \
  -F "external_entity_type=Loan" \
  -F "external_entity_id=loan-78432" \
  -F 'metadata={"loan_number":"2026-04-00183","borrower":"Jane Doe","property_address":"123 Main St"}'

The document bypasses classification entirely and is promoted directly into the specified cabinet.

Accounting system — partial routing

An accounting system knows the cabinet but not the metadata module.

bash
curl -X POST https://api.docuhelix.com/api/v1/ingestion/documents \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Tenant-ID: $ORG_UID" \
  -H "Idempotency-Key: quickbooks-2026-04-03-inv-9912" \
  -F "file=@invoice-9912.pdf" \
  -F "cabinet_uid=01JXYZ789FINANCE000CABINET" \
  -F "external_source=quickbooks" \
  -F "external_reference_id=INV-9912" \
  -F "external_entity_type=Vendor" \
  -F "external_entity_id=vendor-331" \
  -F "raw_document_type=Invoice"

The document is routed to the specified cabinet. DocuHelix classifies the document type and assigns the appropriate metadata module automatically.

Quick Reference

ItemValue
Base URLhttps://api.docuhelix.com
Auth endpointPOST /api/v1/auth/token
Ingestion endpointPOST /api/v1/ingestion/documents
Token lifetime30 minutes
Max file size500 MB
Auth rate limit10 req/min
Ingestion rate limit60 req/min
Tenant headerX-Tenant-ID