Try it
button to try this API now in our playground. It’s the best way to check the full request and response in one place, customize your parameters, and generate ready-to-use code snippets.Sample
Supported Apps
The following apps are currently supported for app source uploads: File Storage & Cloud Services:drive
- Google Drivedropbox
- Dropbox Businessdropboxpersonal
- Dropbox Personalonedrive
- Microsoft OneDrivesharepoint
- Microsoft SharePoint
intercom
- Intercomsalesforce
- Salesforcehubspot
- HubSpot
msteams
- Microsoft Teamsgmail
- Gmailslack
- Slackoutlook
- Microsoft Outlook
jira
- Atlassian Jiraconfluence
- Atlassian Confluenceshortcut
- Shortcutlinear
- Linearasana
- Asana
notion
- Notiongooglecalendar
- Google Calendar
App Source Processing Pipeline
When you upload app sources, each source goes through specialized processing pipelines tailored to the specific app type:1. Immediate Upload & App Detection
- All app sources are immediately accepted and stored securely
- App type is automatically detected (Gmail, Slack, Notion, etc.)
- Each source is routed to its specialized processing pipeline
- You receive a confirmation response with individual
file_id
s for tracking
2. App-Specific Processing Phase
Each app source is processed using specialized pipelines:- Gmail: Email parsing, thread reconstruction, attachment handling
- Slack: Message threading, channel context, user mentions
- Notion: Page hierarchy, block structure, database relationships
- Documents: Format-specific parsing (PDF, DOCX, etc.)
- Custom Apps: Configurable parsing based on app metadata
3. Content Extraction & Normalization
- Multi-format Support: Text, HTML, CSV, Markdown, and file attachments
- Context Preservation: Maintaining app-specific context and relationships
- Metadata Enrichment: Extracting app-specific metadata and timestamps
- Content Cleaning: Normalizing content while preserving structure
4. Intelligent Chunking
- App-aware chunking strategies preserve context and relationships
- Thread-based chunking for Gmail and Slack conversations
- Hierarchical chunking for Notion pages and databases
- Metadata is preserved and associated with each chunk
5. Embedding Generation
- Each chunk is converted into high-dimensional vector embeddings
- Embeddings capture semantic meaning and app-specific context
- Vectors are optimized for similarity search and retrieval
- Cross-app relationship embeddings for related content
6. Indexing & Database Updates
- Embeddings are stored in our vector database for fast similarity search
- Full-text search indexes are created for keyword-based queries
- App-specific metadata is indexed for filtering and faceted search
- Cross-references are established between related app sources
7. Quality Assurance
- App-specific quality checks ensure processing accuracy
- Content validation verifies extracted text completeness
- Relationship validation ensures proper context preservation
- Embedding quality is assessed for optimal retrieval performance
Recommended: For optimal performance, limit each batch to a maximum of 20 app sources per request. Send multiple batch requests with an interval of 1 second between each request.
File ID Management: When you provide afile_id
as a key in thedocument_metadata
object, that specific ID will be used to identify your content. If nofile_id
is provided in thedocument_metadata
, the system will automatically generate a unique identifier for you. This allows you to maintain consistent references to your content across your application while ensuring every piece of content has a unique identifier.
Duplicate File ID Behavior
When you upload app sources withfile_id
s that already exist in your tenant:
- Overwrite Behavior: Each existing app source with a matching
file_id
will be completely replaced with the new source - Processing: Each new app source will go through its specialized processing pipeline independently
- Search Results: Previous search results and embeddings from old app sources will be replaced with the new sources’ content
- Idempotency: Uploading the same app sources with the same
file_id
s multiple times is safe and will result in the same final state
file_id
s will be permanently removed and replaced. This action cannot be undone.Attachments Field Structure
Theattachments
field allows you to include additional files, documents, or content alongside your main app source. Each attachment supports multiple content formats and can contain nested structures for complex documents.
Attachment Object Structure
When to Use Each Field
Core Identification Fields:id
(optional): Unique identifier for the attachment. If not provided, system generates one automatically.title
(optional): Human-readable name for the attachment.url
(optional): External URL where the attachment can be accessed.content_type
(optional): MIME type of the attachment (e.g., “application/pdf”, “text/plain”).content_url
(optional): API endpoint URL for retrieving attachment content.
content.text
: Use for plain text content. Best for simple text documents, notes, or extracted text from other formats.content.html_base64
: Use for HTML content encoded in base64. Ideal for web pages, rich text documents, or formatted content that needs to preserve HTML structure.content.csv_base64
: Use for CSV data encoded in base64. Perfect for tabular data, spreadsheets, or structured data exports.content.markdown
: Use for Markdown-formatted content. Great for documentation, README files, or any content that uses Markdown syntax.content.files
: Use for binary file attachments as an array of file objects. Each file object should contain at least aname
anddata
field (base64 encoded).content.layout
: Use for structured document layouts as an array of layout objects. Useful for complex documents with sections, headers, or custom formatting.
misc
(optional): Dictionary for storing custom metadata, additional properties, or app-specific information about the attachment.
Content Format Guidelines
For Text Content:Best Practices
- Choose the Right Format: Use the content field that best matches your data type for optimal processing.
- Base64 Encoding: Always encode binary data (HTML, CSV, files) in base64 format.
- File Size Limits: Keep individual attachments under 10MB for optimal processing performance.
- Metadata Usage: Use the
misc
field to store app-specific metadata that might be useful for filtering or organization. - Content Type Specification: Always specify
content_type
when possible to help with proper content processing.
Error Responses
All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Query Parameters
Unique identifier for the tenant/organization
"tenant_1234"
Optional sub-tenant identifier used to organize data within a tenant. If omitted, the default sub-tenant created during tenant setup will be used.
"sub_tenant_4567"
Body
List of structured source objects containing app-generated data to be indexed
Stable, unique identifier for the source. If omitted, one may be generated upstream.
"<id>"
Short human-readable title for the source.
"<title>"
High-level category of the source (e.g., document, email, ticket).
"<type>"
Optional long-form description providing additional context.
"<description>"
Free-form notes for internal use or ingestion hints.
"<note>"
Canonical URL or reference link associated with the source.
"<url>"
Creation or last-updated timestamp of the source in ISO-8601 format.
"<timestamp>"
Primary content payload used for indexing and retrieval.
JSON string containing tenant-level document metadata (e.g., department, compliance_tag)
Example: > "{"department":"Finance","compliance_tag":"GDPR"}"
JSON string containing document-specific metadata (e.g., title, author, file_id). If file_id is not provided, the system will generate an ID automatically.
Example: > "{"title":"Q1 Report.pdf","author":"Alice Smith","file_id":"custom_file_123"}"
System-provided attributes (e.g., app_name, local file size) not intended for search filtering.
Attachments related to the source such as images, PDFs, or supplemental files.
Response
Successful Response
List of successfully uploaded app source for indexing
[
{
"file_id": "CortexDoc1234",
"filename": "document1.pdf"
},
{
"file_id": "CortexDoc4567",
"filename": "document2.docx"
}
]
Status message indicating app sources upload scheduled
true