Skip to main content

Glossary

TermDefinition
Archived ProjectA project that has been soft-deleted and hidden from normal views but can be restored. Archived projects do not count against project limits
Audit MetadataAdditional details stored with sync credit transactions for billing transparency: sync type (Initial/Manual/Auto), sync method (Standard/Enhanced), provider, and organization
Auto-DisableAutomatic deactivation of a folder monitor after 5 consecutive failures to prevent wasted API calls
Auto-SyncThe process of automatically syncing documents detected by folder monitors without manual intervention
ChunkA segment of a document stored as a vector embedding. Size is configurable in tokens (default: 512)
Chunk OverlapThe number of tokens shared between consecutive chunks to preserve context across boundaries
Contextual RetrievalAn advanced re-sync technique where an LLM generates a context statement for each chunk before embedding, improving retrieval accuracy by ~49%
cl100k_baseThe tokenizer encoding used by GPT-4 and GPT-3.5-turbo, used by Vector Data Loader for consistent token counting across all embedding providers
CitationA reference to the original source document, typically shown as a clickable link in RAG responses
Connection DetailsPublic connection information (URL, keys, table names) for integrating external RAG apps with your vector store
ConnectorAn integration that connects to external platforms (Confluence, Google Drive, etc.)
CreditThe billing unit in Vector Data Loader. 1 credit = $0.01. Operations consume credits based on complexity
Credit BalanceTotal available credits combining free monthly allocation and purchased credits
Credit PackageA bundle of credits available for one-time purchase. Packages offer volume discounts (17-44% savings)
Cron JobA scheduled task that runs automatically at defined intervals (e.g., every 5 minutes for folder monitoring)
DOCXMicrosoft Word document format (.docx). Supported for direct upload and storage-based sources
EmbeddingA numerical vector representation of text that captures semantic meaning
Enhanced SyncDocument sync using Contextual Retrieval, consuming 40 credits (2x standard) for improved retrieval accuracy
Folder MonitorA configuration that watches a specific folder in a connected source for new files and automatically syncs them
Free Credits250 credits allocated monthly to all accounts. Reset on billing cycle, consumed before purchased credits, don't accumulate
Google WorkspaceGoogle's suite of cloud-native document formats including Google Docs, Google Sheets, and Google Slides
Header-Aware SplittingA text splitting strategy that detects header tags (HTML) or markdown headers and splits documents at section boundaries while preserving the header hierarchy in chunk metadata
Integration IDsOrganization ID and Project ID displayed in Connection Details, used when configuring external tools like n8n or Langchain to scope vector store queries
LLMLarge Language Model - an AI model trained on vast text data to understand and generate human-like text (e.g., GPT-4, Claude)
Magic LinkA secure, single-use URL sent via email for passwordless authentication
OAuthSecure authorization protocol used for third-party integrations (Google, GitHub, Confluence, etc.)
OrganizationA workspace container for team collaboration, settings, and billing
ParserA component that extracts text content from binary files (PDFs, images, Office documents)
Parse Quality ScoreA confidence score (0-100%) indicating how well text was extracted from a document
pgvectorPostgreSQL extension for vector similarity search
PlaygroundAn interface for testing semantic search and RAG chat against indexed documents
PollingA technique where the system periodically checks for changes rather than waiting for push notifications
Purchased CreditsCredits bought via credit packages. Never expire, consumed after free credits are depleted
RAGRetrieval-Augmented Generation - using retrieved documents to enhance LLM responses
Recursive Text SplitterThe default text splitting algorithm that splits content at natural language boundaries (paragraphs, sentences, words) using a hierarchy of separators. Used for plain text, PDFs, and other unstructured content
Re-SyncThe process of reprocessing a document by deleting its existing chunks, re-fetching content, re-chunking, and re-embedding
RoleA user's permission level within an organization (Owner, Admin, or Member)
Semantic SearchFinding documents based on meaning rather than exact keyword matches
Semantic ThresholdThe minimum document size (in tokens) required for header-aware splitting to activate. Documents smaller than this threshold use the simpler recursive splitter to avoid over-fragmentation
Signed URLA time-limited, secure URL that provides temporary access to private storage files
Similarity ScoreA percentage (0-100%) indicating how closely a search result matches the query semantically
Source URLThe direct link to the original document in its source system, stored with each chunk for RAG citation
Standard SyncRegular document sync consuming 20 credits per document
Storage CreditsMonthly charge of 200 credits per GiB for files stored via Upload connector
StreamingReal-time delivery of LLM responses word-by-word as they are generated
SyncThe process of extracting content from a source and loading it into the vector store
TemperatureAn LLM parameter controlling response creativity (0.0 = focused, 2.0 = highly creative)
tiktokenA library for tokenizing text, used by Vector Data Loader via js-tiktoken to accurately count tokens for chunk sizing. Uses the cl100k_base encoding for consistency
TokenA unit of text processing, roughly 4 characters or 0.75 words in English
Top KThe number of most similar results to return from a vector search
Transaction HistoryA chronological record of all credit transactions viewable in Settings > Billing, including purchases, syncs, and monthly billing events with audit metadata
Vector StoreA database optimized for storing and querying vector embeddings