Skip to main content
POST
/
upload
/
upload_document
Upload Files
curl --request POST \
  --url https://api.usecortex.ai/upload/upload_document \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file_id=CortexDoc1234 \
  --form 'tenant_metadata=<string>' \
  --form 'document_metadata=<string>' \
  --form file=@example-file
{
  "file_id": "CortexDoc1234",
  "message": "<string>",
  "success": true
}
Hit the Try it button to try this API now in our playground. It’s the best way to check the full request and response in one place, customize your parameters, and generate ready-to-use code snippets.

Sample Request

curl --request POST \
  --url https://api.usecortex.ai/upload/upload_document \
  --header 'Authorization: Bearer 3e0a174b-bd11-414c-8259-9612a4b99a3f' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file_id=doc_123456' \
  --form 'tenant_metadata={}' \
  --form 'document_metadata={}' \
  --form file=@example-file
Upload documents to your tenant’s knowledge base for processing, chunking, and indexing to enable search and retrieval.

Supported file formats

Complete Reference: For a comprehensive list of all supported file formats with detailed information, see our Supported File Formats documentation.
Unsupported File Formats: If you attempt to upload a file format that is not supported, you will receive an error response with status code 400 and the message: "Unsupported file format: [filename]. Please check our supported file formats documentation." Ensure your files are in one of the supported formats listed above before uploading.

Document Processing Pipeline

When you upload a document, it goes through a comprehensive processing pipeline designed to make your content searchable and retrievable:

1. Immediate Upload & Queue

  • Your document is immediately accepted and stored securely
  • It’s added to our processing queue for background processing
  • You receive a confirmation response with a file_id for tracking

2. Processing Phase

Our system automatically handles:
  • Content Extraction: Extracting text from various formats (PDF, DOCX, TXT, etc.)
  • Document Parsing: Understanding document structure, headers, and formatting
  • Text Cleaning: Removing formatting artifacts and normalizing content

3. Intelligent Chunking

  • Documents are split into semantically meaningful chunks
  • Chunk size is optimized for both context preservation and search accuracy
  • Overlapping boundaries ensure no information is lost between chunks
  • Metadata is preserved and associated with each chunk

4. Embedding Generation

  • Each chunk is converted into high-dimensional vector embeddings
  • Embeddings capture semantic meaning and context
  • Vectors are optimized for similarity search and retrieval

5. Indexing & Database Updates

  • Embeddings are stored in our vector database for fast similarity search
  • Full-text search indexes are created for keyword-based queries
  • Metadata is indexed for filtering and faceted search
  • Cross-references are established for related documents

6. Quality Assurance

  • Automated quality checks ensure processing accuracy
  • Content validation verifies extracted text completeness
  • Embedding quality is assessed for optimal retrieval performance
Processing Time: Most documents are fully processed and searchable within 1-5 minutes. Larger documents (100+ pages) may take up to 15 minutes. You can check processing status using the document ID returned in the response.
Default Sub-Tenant Behavior: If you don’t specify a sub_tenant_id, the document will be uploaded to the default sub-tenant created when your tenant was set up. This is perfect for organization-wide documents that should be accessible across all departments.
File ID Management: The system uses a priority-based approach for file ID assignment:
  1. First Priority: If you provide a file_id as a direct body parameter, that specific ID will be used
  2. Second Priority: If no direct file_id is provided, the system checks for a file_id in the document_metadata object
  3. Auto-Generation: If neither source provides a file_id, the system will automatically generate a unique identifier

Duplicate File ID Behavior

When you upload a document with a file_id that already exists in your tenant:
  • Overwrite Behavior: The existing document with the same file_id will be completely replaced with the new document
  • Processing: The new document will go through the full processing pipeline (content extraction, chunking, embedding generation, indexing)
  • Search Results: Previous search results and embeddings from the old document will be replaced with the new document’s content
  • Idempotency: Uploading the same document with the same file_id multiple times is safe and will result in the same final state
Important: When overwriting an existing document, all previous chunks, embeddings, and search indexes associated with that file_id will be permanently removed and replaced. This action cannot be undone.
Example Success Response for Duplicate File ID:
{
  "message": "Document uploaded successfully. Existing document with file_id 'doc_123456' has been overwritten.",
  "file_id": "doc_123456",
  "status": "success"
}

Processing Status & Monitoring

After uploading, you can monitor your document’s processing status:

Immediate Response

Upon successful upload, you’ll receive:
{
  "message": "Document uploaded successfully",
  "file_id": "doc_123456"
}

Processing States

Your document will progress through these states:
  • queued: Document is in the processing queue, waiting to be processed
  • in_progress: Document is actively being processed (includes content extraction, chunking, embedding generation, and indexing)
  • success: Document is fully processed and searchable
  • errored: Processing encountered an error (rare occurrence)
In-Progress Details: While the status shows in_progress, the system is actually performing multiple steps: content extraction, document parsing, intelligent chunking, embedding generation, and database indexing. These happen sequentially but are all part of the single in_progress state.

When Your Document is Ready

Once processing is complete, your document will be:
  • Searchable via semantic search and Q&A endpoints
  • Retrievable through our retrieval APIs
  • Available for AI-powered applications
  • Indexed for fast query performance
Important: Don’t attempt to search or retrieve your document immediately after upload. Wait for processing to complete (typically 1-5 minutes) to ensure optimal results.

Best Practices

Document Preparation

  • File Size: Documents up to 50MB are processed efficiently
  • Content Quality: Clear, well-structured documents produce better embeddings
  • Metadata: Include rich metadata for better filtering and organization

Processing Optimization

  • Batch Uploads: For multiple documents, consider using our batch upload endpoint
  • Metadata Consistency: Use consistent metadata schemas across your organization
  • File Naming: Descriptive filenames help with document identification

Troubleshooting

Document Not Appearing in Search?
  • Wait 5-10 minutes for processing to complete
  • Check if the document status is errored (rare occurrence)
  • Verify your search query and filters
Slow Processing?
  • Large documents (100+ pages) take longer to process
  • Complex formatting may require additional processing time
  • High system load may temporarily slow processing
Processing Failures?
  • If status shows errored, ensure your document isn’t corrupted or password-protected
  • Check that the file format is supported (see Supported File Formats section above)
  • Verify your API key has sufficient permissions
  • For unsupported formats, you’ll receive a 400 error with the message: "Unsupported file format: [filename]. Please check our supported file formats documentation."
Need Help? If a document fails to process or you’re experiencing issues, contact our support team with the file_id for assistance.

Error Responses

All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

tenant_id
string
required

Unique identifier for the tenant/organization

Example:

"tenant_1234"

sub_tenant_id
string
default:""

Optional sub-tenant identifier used to organize data within a tenant. If omitted, the default sub-tenant created during tenant setup will be used.

Example:

"sub_tenant_4567"

Body

multipart/form-data
file
file
required

The document file to upload (e.g., PDF, DOCX, TXT)

file_id
string | null
default:""

Optional file ID for the uploaded content. If not provided, will be generated automatically.

Example:

"CortexDoc1234"

tenant_metadata
string | null

JSON string containing tenant-level document metadata (e.g., department, compliance_tag)

Example: > "{"department":"Finance","compliance_tag":"GDPR"}"

document_metadata
string | null

JSON string containing document-specific metadata (e.g., title, author, file_id). If file_id is not provided, the system will generate an ID automatically.

Example: > "{"title":"Q1 Report.pdf","author":"Alice Smith","file_id":"custom_file_123"}"

Response

Successful Response

file_id
string
required

Unique identifier for the file being processed

Example:

"CortexDoc1234"

message
string
required

Status message indicating document parsing scheduled or update completed

success
boolean
default:true
Example:

true

I