Sync Settings
The Sync Settings page contains two configuration sections.
Document Chunking
Configure how documents are split into chunks for embedding and search:
| Setting | Description | Default | Range |
|---|---|---|---|
| Preserve Document Structure | When enabled, HTML and Markdown documents are split at header boundaries (h1, h2, etc.) preserving semantic structure | On | Toggle |
| Chunk Size | Target size for each chunk in tokens | 512 | 128-2048 |
| Chunk Overlap | Overlapping tokens between consecutive chunks | 64 | 0-256 |
| Semantic Splitting Threshold | Minimum document size (tokens) for header-aware splitting | 1000 | 100-5000 |
How Chunking Works
- HTML documents (Confluence, websites) are split at header tags (h1-h6)
- Markdown documents (Notion, .md files) are split at # headers
- Other documents (PDFs, text files) use recursive character-based splitting
- Header hierarchy is preserved in chunk metadata for search context
- Changes only affect newly synced documents
tip
Smaller chunks (128-256 tokens) are more precise for search but may lose context. Larger chunks (1024-2048 tokens) preserve more context but may be less focused. 512 tokens provides a good balance for most use cases.
See Also
For a comprehensive deep dive into the splitting algorithms, MIME type detection, token counting, and tuning recommendations, see Text Splitting Methodology.
Auto-Sync Settings
Configure automatic detection and syncing of stale documents:
| Setting | Description | Options |
|---|---|---|
| Enable Auto-Sync | Automatically check for stale documents based on your frequency setting | On/Off |
| Check Frequency | How often to check for stale documents | Every 12 hours, Daily (24h), Weekly, Every 2 weeks, Monthly |
| Staleness Threshold | Days without sync before a document is considered stale | 1-30 days |
| When Documents Become Stale | Action to take when staleness is detected | Mark only, Auto-sync, Notify |
| Max Documents Per Run | Limit for auto-sync operations | 1-25 |