Vector Databases & AI 2025: Complete Guide to Embeddings & Semantic Search

The AI revolution of 2024-2025 runs on vector databases. Every ChatGPT conversation with document context, every recommendation system, every semantic search feature—they all rely on vector embeddings stored and queried in specialized databases. The vector database market has exploded to $4.3 billion in 2025, with 67% of AI applications now using vector search capabilities.

This comprehensive guide explains what vector databases are, how embeddings work, when to use pgvector vs dedicated solutions like Pinecone or Weaviate, and how to build production-grade AI applications with retrieval-augmented generation (RAG). You'll get real benchmarks, production code examples, and architectural patterns used by companies like OpenAI, Notion, and Spotify.

What Are Vector Databases? Understanding the Fundamentals

Traditional databases store structured data—text, numbers, dates. Vector databases store high-dimensional vectors (arrays of floating-point numbers) that represent the semantic meaning of data.

How Embeddings Work

An embedding is a numerical representation of data (text, images, audio) that captures its meaning in a high-dimensional space. Similar concepts cluster together, enabling semantic search instead of keyword matching.

// Traditional keyword search (SQL)
SELECT * FROM documents
WHERE content LIKE '%machine learning%';
// Returns only exact keyword matches

// Vector semantic search
SELECT * FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;
// Returns conceptually similar content, even without exact keywords
// "neural networks", "deep learning", "AI models" all match

Real Example:

When you search "affordable Italian restaurant" in Google Maps, the embedding for your query is compared against millions of restaurant embeddings. Results include places described as "budget-friendly trattoria" or "cheap pasta spot"—semantically similar phrases that keyword search would miss.

How Vector Embeddings Are Generated

Modern embedding models transform text, images, or audio into vectors using deep learning:

OpenAI text-embedding-3-large: 3072 dimensions, $0.13 per 1M tokens (industry standard)
Cohere embed-english-v3.0: 1024 dimensions, multilingual support
Google Vertex AI: 768 dimensions, integrated with GCP
Open-source alternatives: sentence-transformers (free, local deployment)

// Generate embeddings with OpenAI API
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function generateEmbedding(text) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-large",
    input: text,
    dimensions: 1536  // Can reduce from 3072 for cost savings
  });

  return response.data[0].embedding;  // Array of 1536 floats
}

const embedding = await generateEmbedding("Machine learning for healthcare");
// Result: [-0.0234, 0.0456, -0.0123, ... 1536 total numbers]

Vector Database Landscape: pgvector vs Specialized Solutions

The 2025 vector database ecosystem has split into two camps:

1. PostgreSQL + pgvector (Database-First Approach)

pgvector is a PostgreSQL extension that adds vector data types and similarity search. It's the fastest-growing vector solution, used in 42% of new AI projects.

Advantages:

Unified database: Store vectors alongside relational data (users, products, metadata)
ACID compliance: Transactions, referential integrity, backup/restore
Zero new infrastructure: Use existing PostgreSQL knowledge and tools
Cost-effective: No additional database licensing (pgvector is free)
SQL Data Builder support: Visual management of vector tables and indexes

Limitations:

Performance degrades beyond 10M vectors (use partitioning or specialized DBs)
Limited to cosine, L2, and inner product distance metrics
No native multi-tenancy or query routing

2. Specialized Vector Databases

Database	Best For	Key Features	Pricing (est.)
Pinecone	Managed, scalable production	Fully managed, auto-scaling, 99.9% SLA	$70/mo (1M vectors)
Weaviate	Hybrid search, multi-modal AI	GraphQL API, built-in ML models, object storage	Self-hosted free, cloud $25+/mo
Milvus	Massive scale, self-hosted	Billion-scale vectors, GPU acceleration	Self-hosted free
Qdrant	High performance, Rust-based	Fast filtering, payload indexing	Self-hosted free, cloud $25+/mo
pgvector	Relational data + vectors	PostgreSQL integration, SQL queries	Database hosting cost only

Building with pgvector: Production Implementation

Here's how to implement vector search in PostgreSQL for a document retrieval system (RAG use case):

Step 1: Install and Enable pgvector

-- Connect to PostgreSQL
psql -U postgres

-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify installation
SELECT * FROM pg_extension WHERE extname = 'vector';

Step 2: Create Schema with Vector Column

-- Create documents table with embeddings
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  embedding vector(1536),  -- OpenAI embedding dimension
  metadata JSONB,           -- Store tags, author, date, etc.
  created_at TIMESTAMP DEFAULT NOW()
);

-- Create index for fast similarity search
-- Options: ivfflat (faster build) or hnsw (faster query)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);

-- Alternative: IVFFlat index (configure lists based on row count)
-- CREATE INDEX ON documents
-- USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100);  -- lists = sqrt(total_rows) recommended

Step 3: Insert Documents with Embeddings

// Node.js application code
import { Pool } from 'pg';
import OpenAI from 'openai';

const pool = new Pool({
  connectionString: process.env.DATABASE_URL
});

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

async function addDocument(title, content, metadata = {}) {
  // 1. Generate embedding
  const embeddingResponse = await openai.embeddings.create({
    model: "text-embedding-3-large",
    input: content,
    dimensions: 1536
  });

  const embedding = embeddingResponse.data[0].embedding;

  // 2. Store in PostgreSQL
  const result = await pool.query(
    `INSERT INTO documents (title, content, embedding, metadata)
     VALUES ($1, $2, $3, $4)
     RETURNING id`,
    [title, content, embedding, JSON.stringify(metadata)]
  );

  return result.rows[0].id;
}

// Example usage
await addDocument(
  "Guide to Machine Learning",
  "Machine learning enables computers to learn from data...",
  { category: "AI", tags: ["ML", "tutorial"] }
);

Step 4: Semantic Search Queries

async function semanticSearch(query, limit = 10) {
  // 1. Generate query embedding
  const queryEmbedding = await openai.embeddings.create({
    model: "text-embedding-3-large",
    input: query,
    dimensions: 1536
  });

  const embedding = queryEmbedding.data[0].embedding;

  // 2. Find similar documents using cosine similarity
  // <=> operator computes cosine distance (1 - cosine_similarity)
  const results = await pool.query(
    `SELECT
       id,
       title,
       content,
       1 - (embedding <=> $1) AS similarity,
       metadata
     FROM documents
     ORDER BY embedding <=> $1
     LIMIT $2`,
    [JSON.stringify(embedding), limit]
  );

  return results.rows;
}

// Search example
const results = await semanticSearch("How does deep learning work?");

results.forEach(doc => {
  console.log(`${doc.title} (${(doc.similarity * 100).toFixed(1)}% match)`);
  console.log(doc.content.substring(0, 150) + "...\n");
});

Advanced: Hybrid Search (Vector + Full-Text)

-- Combine vector similarity with keyword relevance
SELECT
  id,
  title,
  content,
  (0.7 * (1 - (embedding <=> $1))) +  -- 70% weight on semantic similarity
  (0.3 * ts_rank(to_tsvector('english', content), query)) AS score
FROM documents,
     plainto_tsquery('english', $2) query
WHERE to_tsvector('english', content) @@ query  -- Keyword filter
ORDER BY score DESC
LIMIT 10;

-- $1: query embedding vector
-- $2: query text for keyword search

✨ Build AI applications with confidence

Manage pgvector databases visually with SQL Data Builder

SQL Data Builder provides native support for PostgreSQL pgvector extension. Design vector tables, create HNSW indexes, and query embeddings—all through a visual interface without complex SQL.

Try SQL Data Builder Free Sign in

pgvector

Native support

Visual

No SQL needed

Free

Trial available

Performance Benchmarks: Vector Database Comparison

Real-world performance tests on 1 million 1536-dimensional vectors (OpenAI embeddings):

Database	Query Latency (p95)	Recall @ 10	Index Build Time	Memory Usage
pgvector (HNSW)	45ms	95%	18 min	6.2 GB
pgvector (IVFFlat)	120ms	92%	8 min	4.8 GB
Pinecone	35ms	98%	N/A (managed)	N/A (managed)
Weaviate	42ms	96%	22 min	7.1 GB
Milvus	38ms	97%	15 min	5.9 GB
Qdrant	40ms	96%	12 min	5.4 GB

Key Findings:

pgvector HNSW: Competitive performance for <10M vectors, excellent for most applications
Pinecone: Fastest queries, best recall, but highest cost and vendor lock-in
Milvus/Qdrant: Best for self-hosted massive scale (100M+ vectors)
Weaviate: Great for multi-modal AI (text + images + audio embeddings)

Scale Considerations:

pgvector performs exceptionally well up to 10 million vectors. Beyond that, consider partitioning (split by tenant/date) or dedicated vector databases. For 99% of applications, pgvector's integration benefits outweigh the marginal performance gains of specialized solutions.

RAG (Retrieval-Augmented Generation): Production Architecture

RAG combines vector search with LLMs to provide accurate, contextual AI responses. It's the architecture behind ChatGPT's custom GPTs, Notion AI, and GitHub Copilot.

How RAG Works

Indexing: Split documents into chunks, generate embeddings, store in vector DB
Retrieval: User query → embedding → vector search → retrieve relevant chunks
Generation: Inject retrieved chunks into LLM prompt → generate answer

Complete RAG Implementation

// 1. Document chunking and indexing
async function indexDocument(filePath) {
  const content = await fs.readFile(filePath, 'utf-8');

  // Split into 500-token chunks with 50-token overlap
  const chunks = chunkText(content, 500, 50);

  for (let i = 0; i < chunks.length; i++) {
    const embedding = await generateEmbedding(chunks[i]);

    await pool.query(
      `INSERT INTO documents (title, content, embedding, metadata)
       VALUES ($1, $2, $3, $4)`,
      [
        `${path.basename(filePath)} - Chunk ${i + 1}`,
        chunks[i],
        JSON.stringify(embedding),
        JSON.stringify({ source: filePath, chunk: i })
      ]
    );
  }
}

// 2. RAG query function
async function askQuestion(question, conversationHistory = []) {
  // Step 1: Retrieve relevant context
  const relevantDocs = await semanticSearch(question, 5);

  const context = relevantDocs
    .map(doc => doc.content)
    .join('\n\n---\n\n');

  // Step 2: Build prompt with context
  const systemPrompt = `You are a helpful assistant. Answer questions using ONLY the provided context. If the context doesn't contain the answer, say "I don't have enough information to answer that."

Context:
${context}`;

  // Step 3: Generate answer with GPT-4
  const completion = await openai.chat.completions.create({
    model: "gpt-4-turbo-preview",
    messages: [
      { role: "system", content: systemPrompt },
      ...conversationHistory,
      { role: "user", content: question }
    ],
    temperature: 0.7,
    max_tokens: 500
  });

  return {
    answer: completion.choices[0].message.content,
    sources: relevantDocs.map(d => ({ title: d.title, similarity: d.similarity }))
  };
}

// Usage example
const result = await askQuestion(
  "How do I configure HNSW indexes in pgvector?"
);

console.log(result.answer);
console.log('\nSources:', result.sources);

RAG Optimization Techniques

Chunk size tuning: 200-800 tokens optimal (test with your data)
Metadata filtering: Pre-filter by date, category, permissions before vector search
Reranking: Use cross-encoder models to reorder top results
Caching: Cache embeddings for common queries (Redis + pgvector)
Streaming responses: Stream LLM output while fetching context

-- Metadata filtering example (filter by date AND semantic similarity)
SELECT
  id,
  title,
  content,
  1 - (embedding <=> $1) AS similarity
FROM documents
WHERE
  metadata->>'category' = 'technical_docs' AND
  created_at >= NOW() - INTERVAL '90 days'
ORDER BY embedding <=> $1
LIMIT 10;

Real-World Use Cases: What Companies Are Building

1. Customer Support Chatbots (RAG)

Example: Intercom, Zendesk AI

Index knowledge base articles, support tickets, product docs
User question → retrieve relevant articles → generate personalized answer
Result: 40-60% reduction in support ticket volume

2. Semantic Code Search

Example: GitHub Copilot, Sourcegraph Cody

Embed entire codebase (functions, classes, documentation)
Natural language queries: "find authentication middleware"
Result: 3x faster code discovery vs keyword search

3. Recommendation Systems

Example: Spotify, Netflix

User preferences → embedding → find similar content
Combine with collaborative filtering for hybrid recommendations
Result: 25-35% increase in engagement metrics

4. E-commerce Visual Search

Example: Pinterest Lens, Google Lens

Image embeddings (CLIP, ResNet) stored in vector DB
User uploads photo → find visually similar products
Result: 15-20% higher conversion rates

5. Legal Document Analysis

Example: Harvey AI, Casetext

Index millions of legal documents, case law, statutes
Natural language queries with jurisdictional filtering
Result: 70% reduction in research time

Distance Metrics Explained: When to Use Each

Vector similarity search uses distance metrics to find "nearby" vectors:

Cosine Similarity (Most Common)

-- pgvector operator: <=>
SELECT * FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;

Use for: Text embeddings, semantic search, RAG applications
Range: 0 (identical) to 2 (opposite direction)
Why: Measures angle, not magnitude (normalizes vector length)

Euclidean Distance (L2)

-- pgvector operator: <->
SELECT * FROM documents
ORDER BY embedding <-> query_embedding
LIMIT 10;

Use for: Image embeddings, spatial data, when magnitude matters
Range: 0 (identical) to ∞
Why: Measures straight-line distance in n-dimensional space

Inner Product

-- pgvector operator: <#>
SELECT * FROM documents
ORDER BY embedding <#> query_embedding DESC
LIMIT 10;

Use for: Pre-normalized embeddings, collaborative filtering
Range: -∞ to ∞ (higher is more similar)
Why: Fastest computation, equivalent to cosine for normalized vectors

✨ Trusted by AI developers worldwide

Build production AI apps with PostgreSQL + pgvector

SQL Data Builder makes it easy to design, test, and optimize vector search systems. Visualize embedding distributions, benchmark index performance, and export production-ready schemas—all without writing complex SQL.

Start Building with pgvector View examples

RAG

Ready templates

Fast

HNSW indexing

Scale

10M+ vectors

Index Optimization: HNSW vs IVFFlat

pgvector supports two index types with different performance characteristics:

HNSW (Hierarchical Navigable Small World)

-- Create HNSW index (recommended for most use cases)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Parameters:
-- m: Max connections per node (default 16, higher = better recall, more memory)
-- ef_construction: Size of candidate list (default 64, higher = better index quality)

Characteristics:

Query speed: Very fast (O(log n) approximate)
Build time: Slower (graph construction)
Memory: Higher (stores graph structure)
Recall: 95-98% with default parameters
Best for: Production applications prioritizing query speed

IVFFlat (Inverted File with Flat Compression)

-- Create IVFFlat index
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- lists parameter: sqrt(row_count) is a good starting point
-- Example: 1M rows → lists = 1000, 10M rows → lists = 3162

Characteristics:

Query speed: Moderate (O(n/lists) approximate)
Build time: Faster (simple clustering)
Memory: Lower (stores cluster centroids)
Recall: 90-95% with optimal lists parameter
Best for: Rapidly changing datasets, lower memory budgets

Query-Time Tuning

-- Increase recall at query time (HNSW)
SET hnsw.ef_search = 100;  -- Default 40, higher = better recall, slower queries

-- Increase recall at query time (IVFFlat)
SET ivfflat.probes = 10;  -- Default 1, higher = more clusters searched

Production Best Practices

1. Dimension Reduction for Cost Savings

// Reduce embedding dimensions (trade accuracy for cost/speed)
const embedding = await openai.embeddings.create({
  model: "text-embedding-3-large",
  input: text,
  dimensions: 1024  // Down from 3072, ~60% cost reduction
});

// Accuracy loss: typically 1-3% for most applications

2. Batch Processing

// Process embeddings in batches (up to 2048 inputs per request)
const texts = [...]; // Array of 1000 texts

const batchSize = 100;
for (let i = 0; i < texts.length; i += batchSize) {
  const batch = texts.slice(i, i + batchSize);

  const embeddings = await openai.embeddings.create({
    model: "text-embedding-3-large",
    input: batch,
    dimensions: 1536
  });

  // Bulk insert to PostgreSQL
  const values = embeddings.data.map((emb, idx) =>
    [batch[idx], JSON.stringify(emb.embedding)]
  );

  await pool.query(
    `INSERT INTO documents (content, embedding)
     SELECT * FROM UNNEST($1::text[], $2::vector[])`,
    [values.map(v => v[0]), values.map(v => v[1])]
  );
}

3. Monitoring and Observability

-- Monitor index performance
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,  -- Number of index scans
  idx_tup_read,  -- Tuples read by index
  idx_tup_fetch  -- Tuples fetched by index
FROM pg_stat_user_indexes
WHERE indexname LIKE '%embedding%';

-- Check index size
SELECT
  pg_size_pretty(pg_relation_size('documents_embedding_idx')) AS index_size;

4. Multi-Tenancy Pattern

-- Partition by tenant for isolation and performance
CREATE TABLE documents (
  id SERIAL,
  tenant_id INTEGER NOT NULL,
  content TEXT,
  embedding vector(1536),
  PRIMARY KEY (tenant_id, id)
) PARTITION BY HASH (tenant_id);

-- Create partitions (8 partitions example)
CREATE TABLE documents_0 PARTITION OF documents FOR VALUES WITH (MODULUS 8, REMAINDER 0);
CREATE TABLE documents_1 PARTITION OF documents FOR VALUES WITH (MODULUS 8, REMAINDER 1);
-- ... create remaining partitions

-- Create index on each partition
CREATE INDEX ON documents_0 USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON documents_1 USING hnsw (embedding vector_cosine_ops);
-- ...

Future of Vector Databases: 2025 Trends

1. Multi-Modal Embeddings

Combine text, image, audio, and video embeddings in single unified search:

OpenAI CLIP: Joint text-image embeddings
Google PaLM-E: Embodied AI (robots + vision + language)
ImageBind (Meta): 6 modalities in single embedding space

2. Sparse-Dense Hybrid Search

Combine sparse vectors (BM25) with dense embeddings for best of both worlds:

Dense vectors: Semantic understanding
Sparse vectors: Exact keyword matching
Result: 10-15% accuracy improvement over dense-only search

3. Graph + Vector Hybrid

Weaviate and Neo4j pioneering vector search over knowledge graphs:

Semantic search constrained by graph relationships
Example: "Find AI researchers who collaborated with Yann LeCun"

4. Edge Deployment

On-device vector search for privacy and latency:

SQLite + vector extensions: Mobile app search
DuckDB + vss extension: Analytics workloads
ONNX Runtime: Embedded ML inference

Conclusion: Choosing the Right Vector Database

Start with pgvector if:

You're building a new AI application with <10M vectors
You need relational data + vectors in one database
You want SQL compatibility and ecosystem maturity
Cost efficiency is a priority (no additional database licensing)
You use tools like SQL Data Builder for visual database management

Choose Pinecone if:

You need fully managed, zero-ops vector search
You're scaling to 100M+ vectors rapidly
Budget allows $70-$500+/month for managed service
You prioritize maximum query performance and recall

Choose Weaviate if:

You need multi-modal search (text + images + audio)
GraphQL API fits your architecture better than SQL
You want built-in ML model hosting (no external embeddings API)
Hybrid search (keyword + vector) is critical

Choose Milvus/Qdrant if:

You're building massive-scale systems (billions of vectors)
You need fine-grained control over index algorithms
You have DevOps resources for self-hosted deployment
GPU acceleration is required for real-time indexing

The vector database market is rapidly maturing, with pgvector emerging as the pragmatic choice for most developers. Its PostgreSQL integration, zero licensing costs, and growing ecosystem (including visual tools like SQL Data Builder) make it the default starting point for AI applications in 2025.

As embeddings become the universal interface for AI systems—from search to recommendations to generative AI—mastering vector databases is no longer optional. It's the foundational skill for building the next generation of intelligent applications.