Skip to main content
POST
/
upload
/
upload_app_sources
Upload App Sources
curl --request POST \
  --url https://api.usecortex.ai/upload/upload_app_sources \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '[
  {
    "id": "<id>",
    "title": "<title>",
    "type": "<type>",
    "description": "<description>",
    "note": "<note>",
    "url": "<url>",
    "timestamp": "<timestamp>",
    "content": {
      "text": "<text>",
      "html_base64": "<html_base64>",
      "csv_base64": "<csv_base64>",
      "markdown": "<markdown>",
      "files": [
        {}
      ],
      "layout": []
    },
    "tenant_metadata": {},
    "document_metadata": {},
    "meta": {},
    "attachments": [
      {
        "id": "<id>",
        "url": "<url>",
        "title": "<title>",
        "content_type": "<content_type>",
        "content_url": "<content_url>",
        "misc": {},
        "content": {
          "text": "<text>",
          "html_base64": "<html_base64>",
          "csv_base64": "<csv_base64>",
          "markdown": "<markdown>",
          "files": [
            {}
          ],
          "layout": []
        }
      }
    ]
  }
]'
{
  "uploaded": [
    {
      "file_id": "CortexDoc1234",
      "filename": "document1.pdf"
    },
    {
      "file_id": "CortexDoc4567",
      "filename": "document2.docx"
    }
  ],
  "message": "<string>",
  "success": true
}
Hit the Try it button to try this API now in our playground. It’s the best way to check the full request and response in one place, customize your parameters, and generate ready-to-use code snippets.

Sample

curl --request POST \
  --url 'https://api.usecortex.ai/upload/upload_app_sources?tenant_id=tenant_1234&sub_tenant_id=sub_tenant_4567' \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '[
  {
    "id": "", "title": "", "type": "", "description": "", "note": "", "url": "", "timestamp": "",
    "content": {
      "text": "<string>", "html_base64": "<string>", "csv_base64": "<string>", "markdown": "<string>", "files": [{}], "layout": [{}]
    },
    "tenant_metadata": {},
    "document_metadata": {},
    "meta": {},
    "attachments": [
      {
        "id": "",
        "url": "",
        "title": "",
        "content_type": "",
        "content_url": "",
        "misc": {},
        "content": {
          "text": "<string>",
          "html_base64": "<string>",
          "csv_base64": "<string>",
          "markdown": "<string>",
          "files": [
            {}
          ],
          "layout": [
            {}
          ]
        }
      }
    ]
  }
]'
Works similar to the upload endpoint but is specifically designed to upload multiple app sources (e.g., Gmail, Slack, Notion) in a single request for processing and indexing. Each app upload is handled using specialized pipelines inside Cortex and can include various content types with rich metadata.

Supported Apps

The following apps are currently supported for app source uploads: File Storage & Cloud Services:
  • drive - Google Drive
  • dropbox - Dropbox Business
  • dropboxpersonal - Dropbox Personal
  • onedrive - Microsoft OneDrive
  • sharepoint - Microsoft SharePoint
CRM & Sales:
  • intercom - Intercom
  • salesforce - Salesforce
  • hubspot - HubSpot
Communication & Collaboration:
  • msteams - Microsoft Teams
  • gmail - Gmail
  • slack - Slack
  • outlook - Microsoft Outlook
Project Management:
  • jira - Atlassian Jira
  • confluence - Atlassian Confluence
  • shortcut - Shortcut
  • linear - Linear
  • asana - Asana
Productivity & Organization:
  • notion - Notion
  • googlecalendar - Google Calendar

App Source Processing Pipeline

When you upload app sources, each source goes through specialized processing pipelines tailored to the specific app type:

1. Immediate Upload & App Detection

  • All app sources are immediately accepted and stored securely
  • App type is automatically detected (Gmail, Slack, Notion, etc.)
  • Each source is routed to its specialized processing pipeline
  • You receive a confirmation response with individual file_ids for tracking

2. App-Specific Processing Phase

Each app source is processed using specialized pipelines:
  • Gmail: Email parsing, thread reconstruction, attachment handling
  • Slack: Message threading, channel context, user mentions
  • Notion: Page hierarchy, block structure, database relationships
  • Documents: Format-specific parsing (PDF, DOCX, etc.)
  • Custom Apps: Configurable parsing based on app metadata

3. Content Extraction & Normalization

  • Multi-format Support: Text, HTML, CSV, Markdown, and file attachments
  • Context Preservation: Maintaining app-specific context and relationships
  • Metadata Enrichment: Extracting app-specific metadata and timestamps
  • Content Cleaning: Normalizing content while preserving structure

4. Intelligent Chunking

  • App-aware chunking strategies preserve context and relationships
  • Thread-based chunking for Gmail and Slack conversations
  • Hierarchical chunking for Notion pages and databases
  • Metadata is preserved and associated with each chunk

5. Embedding Generation

  • Each chunk is converted into high-dimensional vector embeddings
  • Embeddings capture semantic meaning and app-specific context
  • Vectors are optimized for similarity search and retrieval
  • Cross-app relationship embeddings for related content

6. Indexing & Database Updates

  • Embeddings are stored in our vector database for fast similarity search
  • Full-text search indexes are created for keyword-based queries
  • App-specific metadata is indexed for filtering and faceted search
  • Cross-references are established between related app sources

7. Quality Assurance

  • App-specific quality checks ensure processing accuracy
  • Content validation verifies extracted text completeness
  • Relationship validation ensures proper context preservation
  • Embedding quality is assessed for optimal retrieval performance
Processing Time: App sources are processed in parallel using specialized pipelines. Most sources are fully processed and searchable within 2-5 minutes. Complex sources with multiple attachments may take up to 10 minutes. You can check processing status using the individual document IDs returned in the response.
Recommended: For optimal performance, limit each batch to a maximum of 20 app sources per request. Send multiple batch requests with an interval of 1 second between each request.
File ID Management: When you provide a file_id as a key in the document_metadata object, that specific ID will be used to identify your content. If no file_id is provided in the document_metadata, the system will automatically generate a unique identifier for you. This allows you to maintain consistent references to your content across your application while ensuring every piece of content has a unique identifier.

Duplicate File ID Behavior

When you upload app sources with file_ids that already exist in your tenant:
  • Overwrite Behavior: Each existing app source with a matching file_id will be completely replaced with the new source
  • Processing: Each new app source will go through its specialized processing pipeline independently
  • Search Results: Previous search results and embeddings from old app sources will be replaced with the new sources’ content
  • Idempotency: Uploading the same app sources with the same file_ids multiple times is safe and will result in the same final state
Important: When overwriting existing app sources, all previous chunks, embeddings, and search indexes associated with those file_ids will be permanently removed and replaced. This action cannot be undone.
Example Success Response for Duplicate File IDs in App Upload:
{
  "message": "App sources uploaded successfully. Sources with existing file_ids have been overwritten.",
  "document_ids": ["gmail_123456", "slack_789012", "notion_345678", "drive_901234"],
  "overwritten_file_ids": ["gmail_123456", "slack_789012"],
  "status": "success"
}

Attachments Field Structure

The attachments field allows you to include additional files, documents, or content alongside your main app source. Each attachment supports multiple content formats and can contain nested structures for complex documents.

Attachment Object Structure

{
  "attachments": [
    {
      "id": "unique_attachment_id",
      "url": "https://example.com/document.pdf",
      "title": "Document Title",
      "content_type": "application/pdf",
      "content_url": "https://api.example.com/content/123",
      "misc": {
        "custom_field": "value"
      },
      "content": {
        "text": "Plain text content",
        "html_base64": "base64_encoded_html",
        "csv_base64": "base64_encoded_csv",
        "markdown": "# Markdown content",
        "files": [{"name": "file.pdf", "data": "base64_data"}],
        "layout": [{"type": "section", "content": "..."}]
      }
    }
  ]
}

When to Use Each Field

Core Identification Fields:
  • id (optional): Unique identifier for the attachment. If not provided, system generates one automatically.
  • title (optional): Human-readable name for the attachment.
  • url (optional): External URL where the attachment can be accessed.
  • content_type (optional): MIME type of the attachment (e.g., “application/pdf”, “text/plain”).
  • content_url (optional): API endpoint URL for retrieving attachment content.
Content Storage Fields: Use these fields to store different types of content directly in the attachment:
  • content.text: Use for plain text content. Best for simple text documents, notes, or extracted text from other formats.
  • content.html_base64: Use for HTML content encoded in base64. Ideal for web pages, rich text documents, or formatted content that needs to preserve HTML structure.
  • content.csv_base64: Use for CSV data encoded in base64. Perfect for tabular data, spreadsheets, or structured data exports.
  • content.markdown: Use for Markdown-formatted content. Great for documentation, README files, or any content that uses Markdown syntax.
  • content.files: Use for binary file attachments as an array of file objects. Each file object should contain at least a name and data field (base64 encoded).
  • content.layout: Use for structured document layouts as an array of layout objects. Useful for complex documents with sections, headers, or custom formatting.
Metadata Field:
  • misc (optional): Dictionary for storing custom metadata, additional properties, or app-specific information about the attachment.

Content Format Guidelines

For Text Content:
{
  "content": {
    "text": "This is plain text content that will be processed and indexed."
  }
}
For HTML Content:
{
  "content": {
    "html_base64": "PGgxPkhlbGxvIFdvcmxkPC9oMT4="
  }
}
For CSV Data:
{
  "content": {
    "csv_base64": "TmFtZSxBbW91bnQKSm9obiwxMDAKSmFuZSwyMDA="
  }
}
For Markdown:
{
  "content": {
    "markdown": "# Document Title\n\nThis is **markdown** content with formatting."
  }
}
For File Attachments:
{
  "content": {
    "files": [
      {
        "name": "document.pdf",
        "data": "JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PAovVHlwZSAvUGFnZQovUGFyZW50IDMgMCBSCi9NZWRpYUJveCBbMCAwIDU5NSA4NDJdCi9SZXNvdXJjZXMgPDwKL0ZvbnQgPDwKL0YxIDIgMCBSCj4+Cj4+Ci9Db250ZW50cyA0IDAgUgo+PgplbmRvYmoK..."
      }
    ]
  }
}

Best Practices

  1. Choose the Right Format: Use the content field that best matches your data type for optimal processing.
  2. Base64 Encoding: Always encode binary data (HTML, CSV, files) in base64 format.
  3. File Size Limits: Keep individual attachments under 10MB for optimal processing performance.
  4. Metadata Usage: Use the misc field to store app-specific metadata that might be useful for filtering or organization.
  5. Content Type Specification: Always specify content_type when possible to help with proper content processing.

Error Responses

All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

tenant_id
string
required

Unique identifier for the tenant/organization

Example:

"tenant_1234"

sub_tenant_id
string
default:""

Optional sub-tenant identifier used to organize data within a tenant. If omitted, the default sub-tenant created during tenant setup will be used.

Example:

"sub_tenant_4567"

Body

application/json · SourceModel · object[]

List of structured source objects containing app-generated data to be indexed

id
string
default:""

Stable, unique identifier for the source. If omitted, one may be generated upstream.

Example:

"<id>"

title
string
default:""

Short human-readable title for the source.

Example:

"<title>"

type
string
default:""

High-level category of the source (e.g., document, email, ticket).

Example:

"<type>"

description
string
default:""

Optional long-form description providing additional context.

Example:

"<description>"

note
string
default:""

Free-form notes for internal use or ingestion hints.

Example:

"<note>"

url
string
default:""

Canonical URL or reference link associated with the source.

Example:

"<url>"

timestamp
string
default:""

Creation or last-updated timestamp of the source in ISO-8601 format.

Example:

"<timestamp>"

content
object

Primary content payload used for indexing and retrieval.

tenant_metadata
object

JSON string containing tenant-level document metadata (e.g., department, compliance_tag)

Example: > "{"department":"Finance","compliance_tag":"GDPR"}"

document_metadata
object

JSON string containing document-specific metadata (e.g., title, author, file_id). If file_id is not provided, the system will generate an ID automatically.

Example: > "{"title":"Q1 Report.pdf","author":"Alice Smith","file_id":"custom_file_123"}"

meta
object

System-provided attributes (e.g., app_name, local file size) not intended for search filtering.

attachments
AttachmentModel · object[]

Attachments related to the source such as images, PDFs, or supplemental files.

Response

Successful Response

uploaded
FileUploadResult · object[]
required

List of successfully uploaded app source for indexing

Example:
[
{
"file_id": "CortexDoc1234",
"filename": "document1.pdf"
},
{
"file_id": "CortexDoc4567",
"filename": "document2.docx"
}
]
message
string
required

Status message indicating app sources upload scheduled

success
boolean
default:true
Example:

true

I