Skip to main content
POST
/
upload
/
upload_app_sources
Upload App Sources
curl --request POST \
  --url https://api.usecortex.ai/upload/upload_app_sources \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '[
  {
    "id": "<id>",
    "title": "<title>",
    "type": "<type>",
    "description": "<description>",
    "note": "<note>",
    "url": "<url>",
    "timestamp": "<timestamp>",
    "content": {
      "text": "<text>",
      "html_base64": "<html_base64>",
      "csv_base64": "<csv_base64>",
      "markdown": "<markdown>",
      "files": [
        {}
      ],
      "layout": []
    },
    "tenant_metadata": {},
    "document_metadata": {},
    "meta": {},
    "attachments": [
      {
        "id": "<id>",
        "url": "<url>",
        "title": "<title>",
        "content_type": "<content_type>",
        "content_url": "<content_url>",
        "misc": {},
        "content": {
          "text": "<text>",
          "html_base64": "<html_base64>",
          "csv_base64": "<csv_base64>",
          "markdown": "<markdown>",
          "files": [
            {}
          ],
          "layout": []
        }
      }
    ]
  }
]'
{
  "uploaded": [
    {
      "file_id": "CortexDoc1234",
      "filename": "document1.pdf"
    },
    {
      "file_id": "CortexDoc4567",
      "filename": "document2.docx"
    }
  ],
  "message": "<string>",
  "success": true
}
Hit the Try it button to try this API now in our playground. It’s the best way to check the full request and response in one place, customize your parameters, and generate ready-to-use code snippets.

Sample

curl --request POST \
  --url 'https://api.usecortex.ai/upload/upload_app_sources?tenant_id=tenant_1234&sub_tenant_id=sub_tenant_4567' \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '[
  {
    "id": "", "title": "", "type": "", "description": "", "note": "", "url": "", "timestamp": "",
    "content": {
      "text": "<string>", "html_base64": "<string>", "csv_base64": "<string>", "markdown": "<string>", "files": [{}], "layout": [{}]
    },
    "tenant_metadata": {},
    "document_metadata": {},
    "meta": {},
    "attachments": [
      {
        "id": "",
        "url": "",
        "title": "",
        "content_type": "",
        "content_url": "",
        "misc": {},
        "content": {
          "text": "<string>",
          "html_base64": "<string>",
          "csv_base64": "<string>",
          "markdown": "<string>",
          "files": [
            {}
          ],
          "layout": [
            {}
          ]
        }
      }
    ]
  }
]'

SDK Examples

  • TypeScript
  • Python (Sync)
const result = await client.upload.uploadAppSources({
  tenant_id: "tenant_1234",
  sub_tenant_id: "sub_tenant_4567",
  body: [
    {
      id: "user-guide-1",
      title: "Feature X Guide",
      content: { text: "How to use feature X" },
      document_metadata: { source: "database" }
    },
    {
      id: "user-guide-2", 
      title: "Feature Y Guide",
      content: { text: "How to use feature Y" },
      document_metadata: { source: "api" }
    }
  ]
});
Works similar to the upload endpoint but is specifically designed to upload multiple app sources (e.g., Gmail, Slack, Notion) in a single request for processing and indexing. Each app upload is handled using specialized pipelines inside Cortex and can include various content types with rich metadata.

Supported Apps

The following apps are currently supported for app source uploads: File Storage & Cloud Services:
  • drive - Google Drive
  • dropbox - Dropbox Business
  • dropboxpersonal - Dropbox Personal
  • onedrive - Microsoft OneDrive
  • sharepoint - Microsoft SharePoint
CRM & Sales:
  • intercom - Intercom
  • salesforce - Salesforce
  • hubspot - HubSpot
Communication & Collaboration:
  • msteams - Microsoft Teams
  • gmail - Gmail
  • slack - Slack
  • outlook - Microsoft Outlook
Project Management:
  • jira - Atlassian Jira
  • confluence - Atlassian Confluence
  • shortcut - Shortcut
  • linear - Linear
  • asana - Asana
Productivity & Organization:
  • notion - Notion
  • googlecalendar - Google Calendar

App Source Processing Pipeline

When you upload app sources, each source goes through specialized processing pipelines tailored to the specific app type:

1. Immediate Upload & App Detection

  • All app sources are immediately accepted and stored securely
  • App type is automatically detected (Gmail, Slack, Notion, etc.)
  • Each source is routed to its specialized processing pipeline
  • You receive a confirmation response with individual file_ids for tracking

2. App-Specific Processing Phase

Each app source is processed using specialized pipelines:
  • Gmail: Email parsing, thread reconstruction, attachment handling
  • Slack: Message threading, channel context, user mentions
  • Notion: Page hierarchy, block structure, database relationships
  • Documents: Format-specific parsing (PDF, DOCX, etc.)
  • Custom Apps: Configurable parsing based on app metadata

3. Content Extraction & Normalization

  • Multi-format Support: Text, HTML, CSV, Markdown, and file attachments
  • Context Preservation: Maintaining app-specific context and relationships
  • Metadata Enrichment: Extracting app-specific metadata and timestamps
  • Content Cleaning: Normalizing content while preserving structure

4. Intelligent Chunking

  • App-aware chunking strategies preserve context and relationships
  • Thread-based chunking for Gmail and Slack conversations
  • Hierarchical chunking for Notion pages and databases
  • Metadata is preserved and associated with each chunk

5. Embedding Generation

  • Each chunk is converted into high-dimensional vector embeddings
  • Embeddings capture semantic meaning and app-specific context
  • Vectors are optimized for similarity search and retrieval
  • Cross-app relationship embeddings for related content

6. Indexing & Database Updates

  • Embeddings are stored in our vector database for fast similarity search
  • Full-text search indexes are created for keyword-based queries
  • App-specific metadata is indexed for filtering and faceted search
  • Cross-references are established between related app sources

7. Quality Assurance

  • App-specific quality checks ensure processing accuracy
  • Content validation verifies extracted text completeness
  • Relationship validation ensures proper context preservation
  • Embedding quality is assessed for optimal retrieval performance
Processing Time: App sources are processed in parallel using specialized pipelines. Most sources are fully processed and searchable within 2-5 minutes. Complex sources with multiple attachments may take up to 10 minutes. You can check processing status using the individual document IDs returned in the response.
Recommended: For optimal performance, limit each batch to a maximum of 20 app sources per request. Send multiple batch requests with an interval of 1 second between each request.
File ID Management: When you provide a file_id as a key in the document_metadata object, that specific ID will be used to identify your content. If no file_id is provided in the document_metadata, the system will automatically generate a unique identifier for you. This allows you to maintain consistent references to your content across your application while ensuring every piece of content has a unique identifier.

Duplicate File ID Behavior

When you upload app sources with file_ids that already exist in your tenant:
  • Overwrite Behavior: Each existing app source with a matching file_id will be completely replaced with the new source
  • Processing: Each new app source will go through its specialized processing pipeline independently
  • Search Results: Previous search results and embeddings from old app sources will be replaced with the new sources’ content
  • Idempotency: Uploading the same app sources with the same file_ids multiple times is safe and will result in the same final state
Important: When overwriting existing app sources, all previous chunks, embeddings, and search indexes associated with those file_ids will be permanently removed and replaced. This action cannot be undone.
Example Success Response for Duplicate File IDs in App Upload:
{
  "message": "App sources uploaded successfully. Sources with existing file_ids have been overwritten.",
  "document_ids": ["gmail_123456", "slack_789012", "notion_345678", "drive_901234"],
  "overwritten_file_ids": ["gmail_123456", "slack_789012"],
  "status": "success"
}

Attachments Field Structure

The attachments field allows you to include additional files, documents, or content alongside your main app source. Each attachment supports multiple content formats and can contain nested structures for complex documents.

Attachment Object Structure

{
  "attachments": [
    {
      "id": "unique_attachment_id",
      "url": "https://example.com/document.pdf",
      "title": "Document Title",
      "content_type": "application/pdf",
      "content_url": "https://api.example.com/content/123",
      "misc": {
        "custom_field": "value"
      },
      "content": {
        "text": "Plain text content",
        "html_base64": "base64_encoded_html",
        "csv_base64": "base64_encoded_csv",
        "markdown": "# Markdown content",
        "files": [{"name": "file.pdf", "data": "base64_data"}],
        "layout": [{"type": "section", "content": "..."}]
      }
    }
  ]
}

When to Use Each Field

Core Identification Fields:
  • id (optional): Unique identifier for the attachment. If not provided, system generates one automatically.
  • title (optional): Human-readable name for the attachment.
  • url (optional): External URL where the attachment can be accessed.
  • content_type (optional): MIME type of the attachment (e.g., “application/pdf”, “text/plain”).
  • content_url (optional): API endpoint URL for retrieving attachment content.
Content Storage Fields: Use these fields to store different types of content directly in the attachment:
  • content.text: Use for plain text content. Best for simple text documents, notes, or extracted text from other formats.
  • content.html_base64: Use for HTML content encoded in base64. Ideal for web pages, rich text documents, or formatted content that needs to preserve HTML structure.
  • content.csv_base64: Use for CSV data encoded in base64. Perfect for tabular data, spreadsheets, or structured data exports.
  • content.markdown: Use for Markdown-formatted content. Great for documentation, README files, or any content that uses Markdown syntax.
  • content.files: Use for binary file attachments as an array of file objects. Each file object should contain at least a name and data field (base64 encoded).
  • content.layout: Use for structured document layouts as an array of layout objects. Useful for complex documents with sections, headers, or custom formatting.
Metadata Field:
  • misc (optional): Dictionary for storing custom metadata, additional properties, or app-specific information about the attachment.

Content Format Guidelines

For Text Content:
{
  "content": {
    "text": "This is plain text content that will be processed and indexed."
  }
}
For HTML Content:
{
  "content": {
    "html_base64": "PGgxPkhlbGxvIFdvcmxkPC9oMT4="
  }
}
For CSV Data:
{
  "content": {
    "csv_base64": "TmFtZSxBbW91bnQKSm9obiwxMDAKSmFuZSwyMDA="
  }
}
For Markdown:
{
  "content": {
    "markdown": "# Document Title\n\nThis is **markdown** content with formatting."
  }
}
For File Attachments:
{
  "content": {
    "files": [
      {
        "name": "document.pdf",
        "data": "JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PAovVHlwZSAvUGFnZQovUGFyZW50IDMgMCBSCi9NZWRpYUJveCBbMCAwIDU5NSA4NDJdCi9SZXNvdXJjZXMgPDwKL0ZvbnQgPDwKL0YxIDIgMCBSCj4+Cj4+Ci9Db250ZW50cyA0IDAgUgo+PgplbmRvYmoK..."
      }
    ]
  }
}

Best Practices

  1. Choose the Right Format: Use the content field that best matches your data type for optimal processing.
  2. Base64 Encoding: Always encode binary data (HTML, CSV, files) in base64 format.
  3. File Size Limits: Keep individual attachments under 10MB for optimal processing performance.
  4. Metadata Usage: Use the misc field to store app-specific metadata that might be useful for filtering or organization.
  5. Content Type Specification: Always specify content_type when possible to help with proper content processing.

Error Responses

All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.

Authorizations

Authorization
string
header
required

Query Parameters

tenant_id
string
required
Example:
sub_tenant_id
string
default:""
Example:

Body

application/json · SourceModel · object[]
id
string
default:""
Example:
title
string
default:""
Example:
type
string
default:""
Example:
description
string
default:""
Example:
note
string
default:""
Example:
url
string
default:""
Example:
timestamp
string
default:""
Example:
content
object
tenant_metadata
object
document_metadata
object
meta
object
attachments
AttachmentModel · object[]

Response

uploaded
FileUploadResult · object[]
required
Example:
message
string
required
success
boolean
default:true
Example: