| Archived Project | A project that has been soft-deleted and hidden from normal views but can be restored. Archived projects do not count against project limits |
| Audit Metadata | Additional details stored with sync credit transactions for billing transparency: sync type (Initial/Manual/Auto), sync method (Standard/Enhanced), provider, and organization |
| Auto-Disable | Automatic deactivation of a folder monitor after 5 consecutive failures to prevent wasted API calls |
| Auto-Sync | The process of automatically syncing documents detected by folder monitors without manual intervention |
| Chunk | A segment of a document stored as a vector embedding. Size is configurable in tokens (default: 512) |
| Chunk Overlap | The number of tokens shared between consecutive chunks to preserve context across boundaries |
| Contextual Retrieval | An advanced re-sync technique where an LLM generates a context statement for each chunk before embedding, improving retrieval accuracy by ~49% |
| cl100k_base | The tokenizer encoding used by GPT-4 and GPT-3.5-turbo, used by Vector Data Loader for consistent token counting across all embedding providers |
| Citation | A reference to the original source document, typically shown as a clickable link in RAG responses |
| Connection Details | Public connection information (URL, keys, table names) for integrating external RAG apps with your vector store |
| Connector | An integration that connects to external platforms (Confluence, Google Drive, etc.) |
| Credit | The billing unit in Vector Data Loader. 1 credit = $0.01. Operations consume credits based on complexity |
| Credit Balance | Total available credits combining free monthly allocation and purchased credits |
| Credit Package | A bundle of credits available for one-time purchase. Packages offer volume discounts (17-44% savings) |
| Cron Job | A scheduled task that runs automatically at defined intervals (e.g., every 5 minutes for folder monitoring) |
| DOCX | Microsoft Word document format (.docx). Supported for direct upload and storage-based sources |
| Embedding | A numerical vector representation of text that captures semantic meaning |
| Enhanced Sync | Document sync using Contextual Retrieval, consuming 40 credits (2x standard) for improved retrieval accuracy |
| Folder Monitor | A configuration that watches a specific folder in a connected source for new files and automatically syncs them |
| Free Credits | 250 credits allocated monthly to all accounts. Reset on billing cycle, consumed before purchased credits, don't accumulate |
| Google Workspace | Google's suite of cloud-native document formats including Google Docs, Google Sheets, and Google Slides |
| Header-Aware Splitting | A text splitting strategy that detects header tags (HTML) or markdown headers and splits documents at section boundaries while preserving the header hierarchy in chunk metadata |
| Integration IDs | Organization ID and Project ID displayed in Connection Details, used when configuring external tools like n8n or Langchain to scope vector store queries |
| LLM | Large Language Model - an AI model trained on vast text data to understand and generate human-like text (e.g., GPT-4, Claude) |
| Magic Link | A secure, single-use URL sent via email for passwordless authentication |
| OAuth | Secure authorization protocol used for third-party integrations (Google, GitHub, Confluence, etc.) |
| Organization | A workspace container for team collaboration, settings, and billing |
| Parser | A component that extracts text content from binary files (PDFs, images, Office documents) |
| Parse Quality Score | A confidence score (0-100%) indicating how well text was extracted from a document |
| pgvector | PostgreSQL extension for vector similarity search |
| Playground | An interface for testing semantic search and RAG chat against indexed documents |
| Polling | A technique where the system periodically checks for changes rather than waiting for push notifications |
| Purchased Credits | Credits bought via credit packages. Never expire, consumed after free credits are depleted |
| RAG | Retrieval-Augmented Generation - using retrieved documents to enhance LLM responses |
| Recursive Text Splitter | The default text splitting algorithm that splits content at natural language boundaries (paragraphs, sentences, words) using a hierarchy of separators. Used for plain text, PDFs, and other unstructured content |
| Re-Sync | The process of reprocessing a document by deleting its existing chunks, re-fetching content, re-chunking, and re-embedding |
| Role | A user's permission level within an organization (Owner, Admin, or Member) |
| Semantic Search | Finding documents based on meaning rather than exact keyword matches |
| Semantic Threshold | The minimum document size (in tokens) required for header-aware splitting to activate. Documents smaller than this threshold use the simpler recursive splitter to avoid over-fragmentation |
| Signed URL | A time-limited, secure URL that provides temporary access to private storage files |
| Similarity Score | A percentage (0-100%) indicating how closely a search result matches the query semantically |
| Source URL | The direct link to the original document in its source system, stored with each chunk for RAG citation |
| Standard Sync | Regular document sync consuming 20 credits per document |
| Storage Credits | Monthly charge of 200 credits per GiB for files stored via Upload connector |
| Streaming | Real-time delivery of LLM responses word-by-word as they are generated |
| Sync | The process of extracting content from a source and loading it into the vector store |
| Temperature | An LLM parameter controlling response creativity (0.0 = focused, 2.0 = highly creative) |
| tiktoken | A library for tokenizing text, used by Vector Data Loader via js-tiktoken to accurately count tokens for chunk sizing. Uses the cl100k_base encoding for consistency |
| Token | A unit of text processing, roughly 4 characters or 0.75 words in English |
| Top K | The number of most similar results to return from a vector search |
| Transaction History | A chronological record of all credit transactions viewable in Settings > Billing, including purchases, syncs, and monthly billing events with audit metadata |
| Vector Store | A database optimized for storing and querying vector embeddings |