Glossary

Term	Definition
Archived Project	A project that has been soft-deleted and hidden from normal views but can be restored. Archived projects do not count against project limits
Audit Metadata	Additional details stored with sync credit transactions for billing transparency: sync type (Initial/Manual/Auto), sync method (Standard/Enhanced), provider, and organization
Auto-Disable	Automatic deactivation of a folder monitor after 5 consecutive failures to prevent wasted API calls
Auto-Sync	The process of automatically syncing documents detected by folder monitors without manual intervention
Chunk	A segment of a document stored as a vector embedding. Size is configurable in tokens (default: 512)
Chunk Overlap	The number of tokens shared between consecutive chunks to preserve context across boundaries
Contextual Retrieval	An advanced re-sync technique where an LLM generates a context statement for each chunk before embedding, improving retrieval accuracy by ~49%
cl100k_base	The tokenizer encoding used by GPT-4 and GPT-3.5-turbo, used by Vector Data Loader for consistent token counting across all embedding providers
Citation	A reference to the original source document, typically shown as a clickable link in RAG responses
Connection Details	Public connection information (URL, keys, table names) for integrating external RAG apps with your vector store
Connector	An integration that connects to external platforms (Confluence, Google Drive, etc.)
Credit	The billing unit in Vector Data Loader. 1 credit = $0.01. Operations consume credits based on complexity
Credit Balance	Total available credits combining startup credits and purchased credits
Credit Package	A bundle of credits available for one-time purchase. Packages offer volume discounts (17-44% savings)
Cron Job	A scheduled task that runs automatically at defined intervals (e.g., every 5 minutes for folder monitoring)
DOCX	Microsoft Word document format (.docx). Supported for direct upload and storage-based sources
Embedding	A numerical vector representation of text that captures semantic meaning
Enhanced Sync	Document sync using Contextual Retrieval, consuming 40 credits (2x standard) for improved retrieval accuracy
Folder Monitor	A configuration that watches a specific folder in a connected source for new files and automatically syncs them
Free Credits	See Startup Credits
Lifetime Deal (LTD)	A one-time purchase that grants a monthly credit allocation permanently. Available in Personal (500/mo), Business (2,000/mo), and Agency (6,000/mo) tiers
Google Workspace	Google's suite of cloud-native document formats including Google Docs, Google Sheets, and Google Slides
Header-Aware Splitting	A text splitting strategy that detects header tags (HTML) or markdown headers and splits documents at section boundaries while preserving the header hierarchy in chunk metadata
Integration IDs	Organization ID and Project ID displayed in Connection Details, used when configuring external tools like n8n or Langchain to scope vector store queries
LLM	Large Language Model - an AI model trained on vast text data to understand and generate human-like text (e.g., GPT-4, Claude)
Magic Link	A secure, single-use URL sent via email for passwordless authentication
OAuth	Secure authorization protocol used for third-party integrations (Google, GitHub, Confluence, etc.)
Organization	A workspace container for team collaboration, settings, and billing
Parser	A component that extracts text content from binary files (PDFs, images, Office documents)
Parse Quality Score	A confidence score (0-100%) indicating how well text was extracted from a document
pgvector	PostgreSQL extension for vector similarity search
Playground	An interface for testing semantic search and RAG chat against indexed documents
Polling	A technique where the system periodically checks for changes rather than waiting for push notifications
Purchased Credits	Credits bought via credit packages or received via Lifetime Deal monthly allocation. Never expire
Startup Credits	A one-time grant of 250 free credits given to every new organization on signup. These do not renew or reset
RAG	Retrieval-Augmented Generation - using retrieved documents to enhance LLM responses
Recursive Text Splitter	The default text splitting algorithm that splits content at natural language boundaries (paragraphs, sentences, words) using a hierarchy of separators. Used for plain text, PDFs, and other unstructured content
Re-Sync	The process of reprocessing a document by deleting its existing chunks, re-fetching content, re-chunking, and re-embedding
Role	A user's permission level within an organization (Owner, Admin, or Member)
Semantic Search	Finding documents based on meaning rather than exact keyword matches
Semantic Threshold	The minimum document size (in tokens) required for header-aware splitting to activate. Documents smaller than this threshold use the simpler recursive splitter to avoid over-fragmentation
Signed URL	A time-limited, secure URL that provides temporary access to private storage files
Similarity Score	A percentage (0-100%) indicating how closely a search result matches the query semantically
Source URL	The direct link to the original document in its source system, stored with each chunk for RAG citation
Standard Sync	Regular document sync consuming 20 credits per document
Storage Credits	Monthly charge of 200 credits per GiB for files stored via Upload connector
Streaming	Real-time delivery of LLM responses word-by-word as they are generated
Sync	The process of extracting content from a source and loading it into the vector store
Temperature	An LLM parameter controlling response creativity (0.0 = focused, 2.0 = highly creative)
tiktoken	A library for tokenizing text, used by Vector Data Loader via js-tiktoken to accurately count tokens for chunk sizing. Uses the cl100k_base encoding for consistency
Token	A unit of text processing, roughly 4 characters or 0.75 words in English
Top K	The number of most similar results to return from a vector search
Transaction History	A chronological record of all credit transactions viewable in Settings > Billing, including purchases, syncs, and monthly billing events with audit metadata
Vector Store	A database optimized for storing and querying vector embeddings