Try it
button to try this API now in our playground. It’s the best way to check the full request and response in one place, customize your parameters, and generate ready-to-use code snippets.Sample Request
Batch Processing Pipeline
When you upload multiple documents, each document goes through our comprehensive processing pipeline in parallel:1. Immediate Upload & Queue
- All documents are immediately accepted and stored securely
- Each document is added to our processing queue for background processing
- You receive a confirmation response with individual
file_id
s for tracking each file
2. Parallel Processing Phase
Each document is processed independently with:- Content Extraction: Extracting text from various supported formats (see Supported File Formats section below)
- Document Parsing: Understanding document structure, headers, and formatting
- Text Cleaning: Removing formatting artifacts and normalizing content
3. Intelligent Chunking
- Each document is split into semantically meaningful chunks
- Chunk size is optimized for both context preservation and search accuracy
- Overlapping boundaries ensure no information is lost between chunks
- Metadata is preserved and associated with each chunk
4. Embedding Generation
- Each chunk is converted into high-dimensional vector embeddings
- Embeddings capture semantic meaning and context
- Vectors are optimized for similarity search and retrieval
5. Indexing & Database Updates
- Embeddings are stored in our vector database for fast similarity search
- Full-text search indexes are created for keyword-based queries
- Metadata is indexed for filtering and faceted search
- Cross-references are established between related documents
6. Quality Assurance
- Automated quality checks ensure processing accuracy for each document
- Content validation verifies extracted text completeness
- Embedding quality is assessed for optimal retrieval performance
sub_tenant_id
, all documents will be uploaded to the default sub-tenant created when your tenant was set up. This is perfect for organization-wide document batches that should be accessible across all departments.Recommended: For optimal performance, limit each batch to a maximum of 20 sources per request. Send multiple batch requests with an interval of 1 second between each request.
File ID Management: The system uses a priority-based approach for file ID assignment:
- First Priority: If you provide a
file_id
as a direct body parameter, that specific ID will be used- Second Priority: If no direct
file_id
is provided, the system checks for afile_id
in thedocument_metadata
object- Auto-Generation: If neither source provides a
file_id
, the system will automatically generate a unique identifier
Duplicate File ID Behavior
When you upload documents withfile_id
s that already exist in your tenant:
- Overwrite Behavior: Each existing document with a matching
file_id
will be completely replaced with the new document - Processing: Each new document will go through the full processing pipeline independently
- Search Results: Previous search results and embeddings from old documents will be replaced with the new documents’ content
- Idempotency: Uploading the same documents with the same
file_id
s multiple times is safe and will result in the same final state
file_id
s will be permanently removed and replaced. This action cannot be undone.Supported File Formats
Cortex supports a comprehensive range of file formats for document processing. Files are automatically parsed and their content extracted for indexing and search.400
and the message: "Unsupported file format: [filename]. Please check our supported file formats documentation."
Ensure your files are in one of the supported formats listed above before uploading.Best Practices
Document Preparation
- File Size: Documents up to 50MB are processed efficiently
- Content Quality: Clear, well-structured documents produce better embeddings
- Metadata: Include rich metadata for better filtering and organization
Processing Optimization
- Batch Size: Limit each batch to a maximum of 20 sources per request
- Request Intervals: Send multiple batch requests with an interval of 1 second between each request
- Metadata Consistency: Use consistent metadata schemas across your organization
- File Naming: Descriptive filenames help with document identification
Troubleshooting
Documents Not Appearing in Search?- Wait 5-10 minutes for processing to complete
- Check if any document status is
errored
(rare occurrence) - Verify your search query and filters
- Large documents (100+ pages) take longer to process
- Complex formatting may require additional processing time
- High system load may temporarily slow processing
- If status shows
errored
, ensure your documents aren’t corrupted or password-protected - Check that the file format is supported (see Supported File Formats section above)
- Verify your API key has sufficient permissions
- For unsupported formats, you’ll receive a
400
error with the message:"Unsupported file format: [filename]. Please check our supported file formats documentation."
file_id
s for assistance.Error Responses
All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Query Parameters
Unique identifier for the tenant/organization
"tenant_1234"
Optional sub-tenant identifier used to organize data within a tenant. If omitted, the default sub-tenant created during tenant setup will be used.
"sub_tenant_4567"
Body
The document file to upload (e.g., PDF, DOCX, TXT)
Optional JSON string array of file IDs for the uploaded content. If not provided or empty, will be generated automatically.
JSON string containing tenant-level document metadata (e.g., department, compliance_tag)
Example: > "{"department":"Finance","compliance_tag":"GDPR"}"
JSON string containing document-specific metadata (e.g., title, author, file_id). If file_id is not provided, the system will generate an ID automatically.
Example: > "{"title":"Q1 Report.pdf","author":"Alice Smith","file_id":"custom_file_123"}"
Response
Successful Response
List of successfully uploaded files for processing
[
{
"file_id": "CortexDoc1234",
"filename": "document1.pdf"
},
{
"file_id": "CortexDoc4567",
"filename": "document2.docx"
}
]
Status message indicating batch document parsing scheduled
true