Unsupported File Formats: If you attempt to upload a file format that is not supported, you will receive an error response with status code
400
and the message: "Unsupported file format: [filename]. Please check our supported file formats documentation."
Ensure your files are in one of the supported formats listed below before uploading.Word Processing and Presentations
Office documents and presentation files:- ODT (OpenDocument Text)
- RTF (Rich Text Format)
- DOC
- DOCX (Office Open XML)
- DOCM, DOT, DOTM
- Pages
- CWK, LWP, MW, MCW, HWP, PBD, zabw
- WPD, WPS
- XML
- Key
- PPT, PPTX, PPTM, POT, POTM, POTX, SDD, SDA, SDP, SGL, STI, SXI, STW, SXW, SXG, UOP
Spreadsheets
Data and spreadsheet formats:- XLSX
- XLS
- XLSM, XLSB, XLW, CSV, TSV, DIF, SYLK, SLK, PRN, NUMBERS, ET, ODS, FODS, UOS1, UOS2, DBF, WK1, WK2, WK3, WK4, WKS, 123, WQ1, WQ2, WB1, WB2, WB3, QPW, XLR, ETH
Markup and Documentation
These formats are ideal for documentation, wikis, and structured text content:- Markdown (all variants: CommonMark, GitHub-Flavored, MultiMarkdown, Markdown Extra, Djot)
- HTML, XHTML
- LaTeX
- reStructuredText
- Textile
- Emacs Org-Mode
- Emacs Muse
- OPML
- DocBook
- AsciiDoc
- MediaWiki, TWiki, DokuWiki, TikiWiki, Vimwiki, Muse Wiki
- JATS, BITS
- FictionBook2 (FB2)
- GNU TexInfo
- Haddock markup
- TEI Simple
- Typst
- pod, man (roff), mdoc
- Creole, Jira wiki markup
- Markua, txt2tags
- Plain text (TXT)
Images
Image files with text content:- JPEG, JPG, PNG, GIF, BMP, SVG, TIFF, WEBP
Image Processing: Image files are processed using OCR (Optical Character Recognition) to extract text content. Ensure images have clear, readable text for best results.
Ebooks
Digital book formats:- EPUB
- FictionBook2 (FB2)
Data/Bibliography/Serialization
Structured data and reference formats:- BibTeX/BibLaTeX/CSL JSON/CSL YAML
- RIS
- EndNote XML
- Jupyter notebook (ipynb)
- JSON
- XML
- Haskell AST
Others (Special Formats)
Specialized and niche formats:- InDesign ICML
- XWiki, ZimWiki
- Beamer, Slidy, Slideous, S5, DZSlides, reveal.js (presentation/slideshow formats)
- ANSI-formatted text
File Size and Processing Limits
File Size Limits
- Maximum file size: 50MB per file
- Recommended size: Under 10MB for optimal processing speed
- Large files: Files over 100 pages may take longer to process
Processing Time
Cortex is optimized for speed and typically processes files much faster than traditional solutions:- Small files (< 10 pages): Under 2 minutes (typically 30 seconds to 1 minute)
- Medium files (10-100 pages): 2-5 minutes (typically 1-3 minutes)
- Large files (100+ pages): 5-15 minutes (typically 3-8 minutes)
Performance Note: Cortex’s processing times are significantly faster than traditional document processing solutions. Most files under 10 pages are processed in under 2 minutes, with many completing in under 1 minute.
Best Practices
File Preparation
- Ensure files are not password-protected
- Use clear, well-structured documents for better parsing
- Avoid corrupted or damaged files
- Use descriptive filenames for easier identification
Format-Specific Tips
PDF Files:- Use text-based PDFs rather than scanned images when possible
- Ensure proper text encoding (UTF-8 preferred)
- Avoid password-protected or encrypted PDFs
- Save in the latest format version when possible
- Remove unnecessary formatting that might interfere with parsing
- Ensure all content is accessible (not hidden or in comments only)
- Use high-resolution images with clear, readable text
- Ensure good contrast between text and background
- Avoid heavily stylized fonts that might be difficult to OCR
- Use standard Markdown syntax for best compatibility
- Include proper heading structure for better chunking
- Use consistent formatting throughout the document
Troubleshooting
Common Issues
File Not Processing:- Check if the file format is supported (see list above)
- Verify the file is not corrupted or password-protected
- Ensure the file size is under 50MB
- Check your API key permissions
- For images: ensure high resolution and clear text
- For PDFs: use text-based rather than scanned PDFs
- For office docs: avoid complex formatting or embedded objects
- Check the file format against the supported list
- Verify file integrity (not corrupted)
- Ensure proper file permissions
- Contact support with the
file_id
if issues persist
Need Help? If you’re experiencing issues with file processing or have questions about format support, contact our support team at founders@usecortex.ai with the
file_id
for assistance.