Skip to main content

Sync Settings

The Sync Settings page contains two configuration sections.

Document Chunking

Configure how documents are split into chunks for embedding and search:

SettingDescriptionDefaultRange
Preserve Document StructureWhen enabled, HTML and Markdown documents are split at header boundaries (h1, h2, etc.) preserving semantic structureOnToggle
Chunk SizeTarget size for each chunk in tokens512128-2048
Chunk OverlapOverlapping tokens between consecutive chunks640-256
Semantic Splitting ThresholdMinimum document size (tokens) for header-aware splitting1000100-5000

How Chunking Works

  • HTML documents (Confluence, websites) are split at header tags (h1-h6)
  • Markdown documents (Notion, .md files) are split at # headers
  • Other documents (PDFs, text files) use recursive character-based splitting
  • Header hierarchy is preserved in chunk metadata for search context
  • Changes only affect newly synced documents
tip

Smaller chunks (128-256 tokens) are more precise for search but may lose context. Larger chunks (1024-2048 tokens) preserve more context but may be less focused. 512 tokens provides a good balance for most use cases.

See Also

For a comprehensive deep dive into the splitting algorithms, MIME type detection, token counting, and tuning recommendations, see Text Splitting Methodology.

Auto-Sync Settings

Configure automatic detection and syncing of stale documents:

SettingDescriptionOptions
Enable Auto-SyncAutomatically check for stale documents based on your frequency settingOn/Off
Check FrequencyHow often to check for stale documentsEvery 12 hours, Daily (24h), Weekly, Every 2 weeks, Monthly
Staleness ThresholdDays without sync before a document is considered stale1-30 days
When Documents Become StaleAction to take when staleness is detectedMark only, Auto-sync, Notify
Max Documents Per RunLimit for auto-sync operations1-25