Sync Settings

The Sync Settings page contains two configuration sections.

Document Chunking

Configure how documents are split into chunks for embedding and search:

Setting	Description	Default	Range
Preserve Document Structure	When enabled, HTML and Markdown documents are split at header boundaries (h1, h2, etc.) preserving semantic structure	On	Toggle
Chunk Size	Target size for each chunk in tokens	512	128-2048
Chunk Overlap	Overlapping tokens between consecutive chunks	64	0-256
Semantic Splitting Threshold	Minimum document size (tokens) for header-aware splitting	1000	100-5000

How Chunking Works

HTML documents (Confluence, websites) are split at header tags (h1-h6)
Markdown documents (Notion, .md files) are split at # headers
Other documents (PDFs, text files) use recursive character-based splitting
Header hierarchy is preserved in chunk metadata for search context
Changes only affect newly synced documents

tip

Smaller chunks (128-256 tokens) are more precise for search but may lose context. Larger chunks (1024-2048 tokens) preserve more context but may be less focused. 512 tokens provides a good balance for most use cases.

Image Extraction (Enhanced Sync)

When using Enhanced sync with a configured Mistral API key, images are automatically extracted from documents during processing:

Automatic extraction: Images found in documents are extracted and stored in organization-scoped storage
AI annotations: Each image receives AI-generated annotations including type classification (photograph, chart, diagram), short description, detailed summary, and key data points
Searchable content: Annotation text is appended to relevant document chunks, making visual content searchable with any embedding provider
Sync metrics: Image extraction counts appear in the sync completion summary

Works with All Providers

Image extraction and annotation works with all embedding providers (OpenAI, Cohere, Gemini, Ollama). You do not need Gemini to benefit from image annotations — the annotation text makes images searchable regardless of your embedding model.

For full details on multimodal capabilities, see Multimodal Embeddings.

Auto-Sync Settings

Configure automatic detection and syncing of stale documents:

Setting	Description	Options
Enable Auto-Sync	Automatically check for stale documents based on your frequency setting	On/Off
Check Frequency	How often to check for stale documents	Every 12 hours, Daily (24h), Weekly, Every 2 weeks, Monthly
Staleness Threshold	Days without sync before a document is considered stale	1-30 days
When Documents Become Stale	Action to take when staleness is detected	Mark only, Auto-sync, Notify
Max Documents Per Run	Limit for auto-sync operations	1-25

Document Chunking​

How Chunking Works​

Image Extraction (Enhanced Sync)​

Auto-Sync Settings​

Document Chunking

How Chunking Works

Image Extraction (Enhanced Sync)

Auto-Sync Settings