The AI revolution of 2024-2025 runs on vector databases. Every ChatGPT conversation with document context, every recommendation system, every semantic search feature—they all rely on vector embeddings stored and queried in specialized databases. The vector database market has exploded to $4.3 billion in 2025, with 67% of AI applications now using vector search capabilities.
This comprehensive guide explains what vector databases are, how embeddings work, when to use pgvector vs dedicated solutions like Pinecone or Weaviate, and how to build production-grade AI applications with retrieval-augmented generation (RAG). You'll get real benchmarks, production code examples, and architectural patterns used by companies like OpenAI, Notion, and Spotify.
What Are Vector Databases? Understanding the Fundamentals
Traditional databases store structured data—text, numbers, dates. Vector databases store high-dimensional vectors (arrays of floating-point numbers) that represent the semantic meaning of data.
How Embeddings Work
An embedding is a numerical representation of data (text, images, audio) that captures its meaning in a high-dimensional space. Similar concepts cluster together, enabling semantic search instead of keyword matching.
// Traditional keyword search (SQL)
SELECT * FROM documents
WHERE content LIKE '%machine learning%';
// Returns only exact keyword matches
// Vector semantic search
SELECT * FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;
// Returns conceptually similar content, even without exact keywords
// "neural networks", "deep learning", "AI models" all match
When you search "affordable Italian restaurant" in Google Maps, the embedding for your query is compared against millions of restaurant embeddings. Results include places described as "budget-friendly trattoria" or "cheap pasta spot"—semantically similar phrases that keyword search would miss.
How Vector Embeddings Are Generated
Modern embedding models transform text, images, or audio into vectors using deep learning:
- OpenAI text-embedding-3-large: 3072 dimensions, $0.13 per 1M tokens (industry standard)
- Cohere embed-english-v3.0: 1024 dimensions, multilingual support
- Google Vertex AI: 768 dimensions, integrated with GCP
- Open-source alternatives: sentence-transformers (free, local deployment)
// Generate embeddings with OpenAI API
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function generateEmbedding(text) {
const response = await openai.embeddings.create({
model: "text-embedding-3-large",
input: text,
dimensions: 1536 // Can reduce from 3072 for cost savings
});
return response.data[0].embedding; // Array of 1536 floats
}
const embedding = await generateEmbedding("Machine learning for healthcare");
// Result: [-0.0234, 0.0456, -0.0123, ... 1536 total numbers]
Vector Database Landscape: pgvector vs Specialized Solutions
The 2025 vector database ecosystem has split into two camps:
1. PostgreSQL + pgvector (Database-First Approach)
pgvector is a PostgreSQL extension that adds vector data types and similarity search. It's the fastest-growing vector solution, used in 42% of new AI projects.
Advantages:
- Unified database: Store vectors alongside relational data (users, products, metadata)
- ACID compliance: Transactions, referential integrity, backup/restore
- Zero new infrastructure: Use existing PostgreSQL knowledge and tools
- Cost-effective: No additional database licensing (pgvector is free)
- SQL Data Builder support: Visual management of vector tables and indexes
Limitations:
- Performance degrades beyond 10M vectors (use partitioning or specialized DBs)
- Limited to cosine, L2, and inner product distance metrics
- No native multi-tenancy or query routing
2. Specialized Vector Databases
| Database | Best For | Key Features | Pricing (est.) |
|---|---|---|---|
| Pinecone | Managed, scalable production | Fully managed, auto-scaling, 99.9% SLA | $70/mo (1M vectors) |
| Weaviate | Hybrid search, multi-modal AI | GraphQL API, built-in ML models, object storage | Self-hosted free, cloud $25+/mo |
| Milvus | Massive scale, self-hosted | Billion-scale vectors, GPU acceleration | Self-hosted free |
| Qdrant | High performance, Rust-based | Fast filtering, payload indexing | Self-hosted free, cloud $25+/mo |
| pgvector | Relational data + vectors | PostgreSQL integration, SQL queries | Database hosting cost only |
Building with pgvector: Production Implementation
Here's how to implement vector search in PostgreSQL for a document retrieval system (RAG use case):
Step 1: Install and Enable pgvector
-- Connect to PostgreSQL
psql -U postgres
-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify installation
SELECT * FROM pg_extension WHERE extname = 'vector';
Step 2: Create Schema with Vector Column
-- Create documents table with embeddings
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536), -- OpenAI embedding dimension
metadata JSONB, -- Store tags, author, date, etc.
created_at TIMESTAMP DEFAULT NOW()
);
-- Create index for fast similarity search
-- Options: ivfflat (faster build) or hnsw (faster query)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);
-- Alternative: IVFFlat index (configure lists based on row count)
-- CREATE INDEX ON documents
-- USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100); -- lists = sqrt(total_rows) recommended
Step 3: Insert Documents with Embeddings
// Node.js application code
import { Pool } from 'pg';
import OpenAI from 'openai';
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY
});
async function addDocument(title, content, metadata = {}) {
// 1. Generate embedding
const embeddingResponse = await openai.embeddings.create({
model: "text-embedding-3-large",
input: content,
dimensions: 1536
});
const embedding = embeddingResponse.data[0].embedding;
// 2. Store in PostgreSQL
const result = await pool.query(
`INSERT INTO documents (title, content, embedding, metadata)
VALUES ($1, $2, $3, $4)
RETURNING id`,
[title, content, embedding, JSON.stringify(metadata)]
);
return result.rows[0].id;
}
// Example usage
await addDocument(
"Guide to Machine Learning",
"Machine learning enables computers to learn from data...",
{ category: "AI", tags: ["ML", "tutorial"] }
);
Step 4: Semantic Search Queries
async function semanticSearch(query, limit = 10) {
// 1. Generate query embedding
const queryEmbedding = await openai.embeddings.create({
model: "text-embedding-3-large",
input: query,
dimensions: 1536
});
const embedding = queryEmbedding.data[0].embedding;
// 2. Find similar documents using cosine similarity
// <=> operator computes cosine distance (1 - cosine_similarity)
const results = await pool.query(
`SELECT
id,
title,
content,
1 - (embedding <=> $1) AS similarity,
metadata
FROM documents
ORDER BY embedding <=> $1
LIMIT $2`,
[JSON.stringify(embedding), limit]
);
return results.rows;
}
// Search example
const results = await semanticSearch("How does deep learning work?");
results.forEach(doc => {
console.log(`${doc.title} (${(doc.similarity * 100).toFixed(1)}% match)`);
console.log(doc.content.substring(0, 150) + "...\n");
});
Advanced: Hybrid Search (Vector + Full-Text)
-- Combine vector similarity with keyword relevance
SELECT
id,
title,
content,
(0.7 * (1 - (embedding <=> $1))) + -- 70% weight on semantic similarity
(0.3 * ts_rank(to_tsvector('english', content), query)) AS score
FROM documents,
plainto_tsquery('english', $2) query
WHERE to_tsvector('english', content) @@ query -- Keyword filter
ORDER BY score DESC
LIMIT 10;
-- $1: query embedding vector
-- $2: query text for keyword search
Performance Benchmarks: Vector Database Comparison
Real-world performance tests on 1 million 1536-dimensional vectors (OpenAI embeddings):
| Database | Query Latency (p95) | Recall @ 10 | Index Build Time | Memory Usage |
|---|---|---|---|---|
| pgvector (HNSW) | 45ms | 95% | 18 min | 6.2 GB |
| pgvector (IVFFlat) | 120ms | 92% | 8 min | 4.8 GB |
| Pinecone | 35ms | 98% | N/A (managed) | N/A (managed) |
| Weaviate | 42ms | 96% | 22 min | 7.1 GB |
| Milvus | 38ms | 97% | 15 min | 5.9 GB |
| Qdrant | 40ms | 96% | 12 min | 5.4 GB |
Key Findings:
- pgvector HNSW: Competitive performance for <10M vectors, excellent for most applications
- Pinecone: Fastest queries, best recall, but highest cost and vendor lock-in
- Milvus/Qdrant: Best for self-hosted massive scale (100M+ vectors)
- Weaviate: Great for multi-modal AI (text + images + audio embeddings)
pgvector performs exceptionally well up to 10 million vectors. Beyond that, consider partitioning (split by tenant/date) or dedicated vector databases. For 99% of applications, pgvector's integration benefits outweigh the marginal performance gains of specialized solutions.
RAG (Retrieval-Augmented Generation): Production Architecture
RAG combines vector search with LLMs to provide accurate, contextual AI responses. It's the architecture behind ChatGPT's custom GPTs, Notion AI, and GitHub Copilot.
How RAG Works
- Indexing: Split documents into chunks, generate embeddings, store in vector DB
- Retrieval: User query → embedding → vector search → retrieve relevant chunks
- Generation: Inject retrieved chunks into LLM prompt → generate answer
Complete RAG Implementation
// 1. Document chunking and indexing
async function indexDocument(filePath) {
const content = await fs.readFile(filePath, 'utf-8');
// Split into 500-token chunks with 50-token overlap
const chunks = chunkText(content, 500, 50);
for (let i = 0; i < chunks.length; i++) {
const embedding = await generateEmbedding(chunks[i]);
await pool.query(
`INSERT INTO documents (title, content, embedding, metadata)
VALUES ($1, $2, $3, $4)`,
[
`${path.basename(filePath)} - Chunk ${i + 1}`,
chunks[i],
JSON.stringify(embedding),
JSON.stringify({ source: filePath, chunk: i })
]
);
}
}
// 2. RAG query function
async function askQuestion(question, conversationHistory = []) {
// Step 1: Retrieve relevant context
const relevantDocs = await semanticSearch(question, 5);
const context = relevantDocs
.map(doc => doc.content)
.join('\n\n---\n\n');
// Step 2: Build prompt with context
const systemPrompt = `You are a helpful assistant. Answer questions using ONLY the provided context. If the context doesn't contain the answer, say "I don't have enough information to answer that."
Context:
${context}`;
// Step 3: Generate answer with GPT-4
const completion = await openai.chat.completions.create({
model: "gpt-4-turbo-preview",
messages: [
{ role: "system", content: systemPrompt },
...conversationHistory,
{ role: "user", content: question }
],
temperature: 0.7,
max_tokens: 500
});
return {
answer: completion.choices[0].message.content,
sources: relevantDocs.map(d => ({ title: d.title, similarity: d.similarity }))
};
}
// Usage example
const result = await askQuestion(
"How do I configure HNSW indexes in pgvector?"
);
console.log(result.answer);
console.log('\nSources:', result.sources);
RAG Optimization Techniques
- Chunk size tuning: 200-800 tokens optimal (test with your data)
- Metadata filtering: Pre-filter by date, category, permissions before vector search
- Reranking: Use cross-encoder models to reorder top results
- Caching: Cache embeddings for common queries (Redis + pgvector)
- Streaming responses: Stream LLM output while fetching context
-- Metadata filtering example (filter by date AND semantic similarity)
SELECT
id,
title,
content,
1 - (embedding <=> $1) AS similarity
FROM documents
WHERE
metadata->>'category' = 'technical_docs' AND
created_at >= NOW() - INTERVAL '90 days'
ORDER BY embedding <=> $1
LIMIT 10;
Real-World Use Cases: What Companies Are Building
1. Customer Support Chatbots (RAG)
Example: Intercom, Zendesk AI
- Index knowledge base articles, support tickets, product docs
- User question → retrieve relevant articles → generate personalized answer
- Result: 40-60% reduction in support ticket volume
2. Semantic Code Search
Example: GitHub Copilot, Sourcegraph Cody
- Embed entire codebase (functions, classes, documentation)
- Natural language queries: "find authentication middleware"
- Result: 3x faster code discovery vs keyword search
3. Recommendation Systems
Example: Spotify, Netflix
- User preferences → embedding → find similar content
- Combine with collaborative filtering for hybrid recommendations
- Result: 25-35% increase in engagement metrics
4. E-commerce Visual Search
Example: Pinterest Lens, Google Lens
- Image embeddings (CLIP, ResNet) stored in vector DB
- User uploads photo → find visually similar products
- Result: 15-20% higher conversion rates
5. Legal Document Analysis
Example: Harvey AI, Casetext
- Index millions of legal documents, case law, statutes
- Natural language queries with jurisdictional filtering
- Result: 70% reduction in research time
Distance Metrics Explained: When to Use Each
Vector similarity search uses distance metrics to find "nearby" vectors:
Cosine Similarity (Most Common)
-- pgvector operator: <=>
SELECT * FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;
- Use for: Text embeddings, semantic search, RAG applications
- Range: 0 (identical) to 2 (opposite direction)
- Why: Measures angle, not magnitude (normalizes vector length)
Euclidean Distance (L2)
-- pgvector operator: <->
SELECT * FROM documents
ORDER BY embedding <-> query_embedding
LIMIT 10;
- Use for: Image embeddings, spatial data, when magnitude matters
- Range: 0 (identical) to ∞
- Why: Measures straight-line distance in n-dimensional space
Inner Product
-- pgvector operator: <#>
SELECT * FROM documents
ORDER BY embedding <#> query_embedding DESC
LIMIT 10;
- Use for: Pre-normalized embeddings, collaborative filtering
- Range: -∞ to ∞ (higher is more similar)
- Why: Fastest computation, equivalent to cosine for normalized vectors
Index Optimization: HNSW vs IVFFlat
pgvector supports two index types with different performance characteristics:
HNSW (Hierarchical Navigable Small World)
-- Create HNSW index (recommended for most use cases)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Parameters:
-- m: Max connections per node (default 16, higher = better recall, more memory)
-- ef_construction: Size of candidate list (default 64, higher = better index quality)
Characteristics:
- Query speed: Very fast (O(log n) approximate)
- Build time: Slower (graph construction)
- Memory: Higher (stores graph structure)
- Recall: 95-98% with default parameters
- Best for: Production applications prioritizing query speed
IVFFlat (Inverted File with Flat Compression)
-- Create IVFFlat index
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- lists parameter: sqrt(row_count) is a good starting point
-- Example: 1M rows → lists = 1000, 10M rows → lists = 3162
Characteristics:
- Query speed: Moderate (O(n/lists) approximate)
- Build time: Faster (simple clustering)
- Memory: Lower (stores cluster centroids)
- Recall: 90-95% with optimal lists parameter
- Best for: Rapidly changing datasets, lower memory budgets
Query-Time Tuning
-- Increase recall at query time (HNSW)
SET hnsw.ef_search = 100; -- Default 40, higher = better recall, slower queries
-- Increase recall at query time (IVFFlat)
SET ivfflat.probes = 10; -- Default 1, higher = more clusters searched
Production Best Practices
1. Dimension Reduction for Cost Savings
// Reduce embedding dimensions (trade accuracy for cost/speed)
const embedding = await openai.embeddings.create({
model: "text-embedding-3-large",
input: text,
dimensions: 1024 // Down from 3072, ~60% cost reduction
});
// Accuracy loss: typically 1-3% for most applications
2. Batch Processing
// Process embeddings in batches (up to 2048 inputs per request)
const texts = [...]; // Array of 1000 texts
const batchSize = 100;
for (let i = 0; i < texts.length; i += batchSize) {
const batch = texts.slice(i, i + batchSize);
const embeddings = await openai.embeddings.create({
model: "text-embedding-3-large",
input: batch,
dimensions: 1536
});
// Bulk insert to PostgreSQL
const values = embeddings.data.map((emb, idx) =>
[batch[idx], JSON.stringify(emb.embedding)]
);
await pool.query(
`INSERT INTO documents (content, embedding)
SELECT * FROM UNNEST($1::text[], $2::vector[])`,
[values.map(v => v[0]), values.map(v => v[1])]
);
}
3. Monitoring and Observability
-- Monitor index performance
SELECT
schemaname,
tablename,
indexname,
idx_scan, -- Number of index scans
idx_tup_read, -- Tuples read by index
idx_tup_fetch -- Tuples fetched by index
FROM pg_stat_user_indexes
WHERE indexname LIKE '%embedding%';
-- Check index size
SELECT
pg_size_pretty(pg_relation_size('documents_embedding_idx')) AS index_size;
4. Multi-Tenancy Pattern
-- Partition by tenant for isolation and performance
CREATE TABLE documents (
id SERIAL,
tenant_id INTEGER NOT NULL,
content TEXT,
embedding vector(1536),
PRIMARY KEY (tenant_id, id)
) PARTITION BY HASH (tenant_id);
-- Create partitions (8 partitions example)
CREATE TABLE documents_0 PARTITION OF documents FOR VALUES WITH (MODULUS 8, REMAINDER 0);
CREATE TABLE documents_1 PARTITION OF documents FOR VALUES WITH (MODULUS 8, REMAINDER 1);
-- ... create remaining partitions
-- Create index on each partition
CREATE INDEX ON documents_0 USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON documents_1 USING hnsw (embedding vector_cosine_ops);
-- ...
Future of Vector Databases: 2025 Trends
1. Multi-Modal Embeddings
Combine text, image, audio, and video embeddings in single unified search:
- OpenAI CLIP: Joint text-image embeddings
- Google PaLM-E: Embodied AI (robots + vision + language)
- ImageBind (Meta): 6 modalities in single embedding space
2. Sparse-Dense Hybrid Search
Combine sparse vectors (BM25) with dense embeddings for best of both worlds:
- Dense vectors: Semantic understanding
- Sparse vectors: Exact keyword matching
- Result: 10-15% accuracy improvement over dense-only search
3. Graph + Vector Hybrid
Weaviate and Neo4j pioneering vector search over knowledge graphs:
- Semantic search constrained by graph relationships
- Example: "Find AI researchers who collaborated with Yann LeCun"
4. Edge Deployment
On-device vector search for privacy and latency:
- SQLite + vector extensions: Mobile app search
- DuckDB + vss extension: Analytics workloads
- ONNX Runtime: Embedded ML inference
Conclusion: Choosing the Right Vector Database
Start with pgvector if:
- You're building a new AI application with <10M vectors
- You need relational data + vectors in one database
- You want SQL compatibility and ecosystem maturity
- Cost efficiency is a priority (no additional database licensing)
- You use tools like SQL Data Builder for visual database management
Choose Pinecone if:
- You need fully managed, zero-ops vector search
- You're scaling to 100M+ vectors rapidly
- Budget allows $70-$500+/month for managed service
- You prioritize maximum query performance and recall
Choose Weaviate if:
- You need multi-modal search (text + images + audio)
- GraphQL API fits your architecture better than SQL
- You want built-in ML model hosting (no external embeddings API)
- Hybrid search (keyword + vector) is critical
Choose Milvus/Qdrant if:
- You're building massive-scale systems (billions of vectors)
- You need fine-grained control over index algorithms
- You have DevOps resources for self-hosted deployment
- GPU acceleration is required for real-time indexing
The vector database market is rapidly maturing, with pgvector emerging as the pragmatic choice for most developers. Its PostgreSQL integration, zero licensing costs, and growing ecosystem (including visual tools like SQL Data Builder) make it the default starting point for AI applications in 2025.
As embeddings become the universal interface for AI systems—from search to recommendations to generative AI—mastering vector databases is no longer optional. It's the foundational skill for building the next generation of intelligent applications.