Sync Reliability & Recovery
Vector Data Loader uses a job-based sync system with built-in retry logic, progress checkpointing, and automatic recovery. This means your sync operations are resilient to temporary failures, timeouts, and network issues.
How Sync Jobs Work
Every sync operation — whether triggered manually, via bulk resync, or by auto-sync — creates a sync job that processes documents in the background.
Job Lifecycle
| Stage | What Happens |
|---|---|
| Created | Job is created with a list of document IDs to process |
| Running | Documents are processed one at a time, with progress saved after each |
| Checkpointed | If the job approaches a timeout, progress is saved and processing continues automatically |
| Completed | All documents processed successfully |
| Completed with Errors | Some documents succeeded, some failed |
| Failed | All documents failed to sync |
Progress Tracking
When you start a sync, a Sync Progress Modal appears showing real-time progress:
- Current document being processed
- Success and failure counts
- Overall completion percentage
- Error details for any failed documents
You can close the modal and continue working — the sync continues in the background. Reopen the Documents page to see updated statuses.
You can sync up to 20 documents at once. For 5 or fewer documents, sync begins immediately. For 6-20 documents, processing runs in the background.
Automatic Retry Logic
When a document fails to sync, the system classifies the error and decides whether to retry automatically.
Error Classification
| Category | Examples | Retried? | Max Attempts |
|---|---|---|---|
| Transient | Timeouts, rate limits (429), server errors (500/502/503), network failures | Yes | 5 |
| Permanent | Unauthorized (401), forbidden (403), not found (404), insufficient credits | No | 0 |
| Unknown | Unrecognized errors that don't match known patterns | Yes | 3 |
Transient Errors (Auto-Retried)
These are temporary issues that usually resolve on their own:
- Connection timeouts — the source server was slow to respond
- Rate limiting (429) — too many API requests in a short period
- Server errors (500, 502, 503, 504) — the source service had a temporary outage
- Network failures — brief connectivity interruption
The system uses exponential backoff between retry attempts, starting at 30 seconds and doubling each time (up to 1 hour maximum). A small random delay (jitter) is added to prevent multiple retries from colliding.
| Attempt | Approximate Wait |
|---|---|
| 1st retry | ~30 seconds |
| 2nd retry | ~1 minute |
| 3rd retry | ~2 minutes |
| 4th retry | ~4 minutes |
| 5th retry | ~8 minutes |
Permanent Errors (Not Retried)
These indicate a problem that won't resolve on its own:
- Authentication expired — reconnect the source in Settings
- Document deleted — the source document no longer exists
- Insufficient credits — purchase more credits in Settings > Billing
- Configuration missing — set up your embedding or vector store provider
Permanent errors require you to fix the underlying issue (reconnect a source, add credits, etc.) and then manually re-sync the affected documents.
Crash Recovery
If a sync job is interrupted — for example, due to a server restart or function timeout — the system automatically recovers.
How Recovery Works
- Progress is checkpointed after each document, so completed work is never lost
- A background health check runs periodically to detect interrupted jobs
- Interrupted jobs are automatically resumed from where they left off
- Each job can be resumed up to 3 times before being marked as failed
What You'll See
- Documents that were already synced before the interruption keep their "Synced" status
- Remaining documents are retried automatically
- If a job cannot be resumed, it's marked as failed and you can start a new sync
You never need to manually intervene for crash recovery. The system handles it automatically. If you see a job stuck in "Running" status for more than an hour, it will be cleaned up by the next health check.
Sync Health Dashboard
The Dashboard includes a Sync Health panel that gives you visibility into your sync reliability over the past 7 days:
| Metric | Description |
|---|---|
| Job Outcomes | Count of succeeded, partially succeeded, and failed sync jobs |
| Document Success Rate | Percentage of documents that synced successfully |
| Average Duration | Typical time for a sync job to complete |
| Errors by Source | Which connectors are experiencing the most errors |
| Top Errors | Most frequent error messages across all sync jobs |
Use this panel to spot patterns — for example, if a particular source connector is consistently failing, you may need to reconnect it or check its API credentials.
Limits & Constraints
| Limit | Value | Purpose |
|---|---|---|
| Max documents per sync | 20 | Prevents oversized jobs |
| Max resume attempts | 3 | Prevents infinite retry loops |
| Stale job threshold | 1 hour | Jobs running longer are considered stuck |
| Max retry attempts (transient) | 5 | Generous retries for temporary issues |
| Max retry attempts (unknown) | 3 | Moderate retries for unclassified errors |
| One job at a time | Per organization | Prevents resource contention |
Best Practices
- Start small — When syncing a new source for the first time, sync a few documents first to verify your configuration works
- Check the Sync Health panel — After bulk operations, review the dashboard to spot any failures
- Fix permanent errors promptly — Expired OAuth tokens or missing API keys won't resolve on their own
- Don't re-sync during an active job — Wait for the current job to complete before starting another
- Use auto-sync for ongoing maintenance — Configure auto-sync in Settings to keep documents fresh automatically