AI Guide 📅 January 3, 2025 📖 12 min read

Vector Databases & AI 2025: Complete Guide to Embeddings & Semantic Search

Vector databases power modern AI applications from ChatGPT to recommendation engines. Learn how embeddings enable semantic search, compare pgvector vs Pinecone vs Weaviate, and build production RAG systems with real benchmarks.

The AI revolution of 2024-2025 runs on vector databases. Every ChatGPT conversation with document context, every recommendation system, every semantic search feature—they all rely on vector embeddings stored and queried in specialized databases. The vector database market has exploded to $4.3 billion in 2025, with 67% of AI applications now using vector search capabilities.

This comprehensive guide explains what vector databases are, how embeddings work, when to use pgvector vs dedicated solutions like Pinecone or Weaviate, and how to build production-grade AI applications with retrieval-augmented generation (RAG). You'll get real benchmarks, production code examples, and architectural patterns used by companies like OpenAI, Notion, and Spotify.

What Are Vector Databases? Understanding the Fundamentals

Traditional databases store structured data—text, numbers, dates. Vector databases store high-dimensional vectors (arrays of floating-point numbers) that represent the semantic meaning of data.

How Embeddings Work

An embedding is a numerical representation of data (text, images, audio) that captures its meaning in a high-dimensional space. Similar concepts cluster together, enabling semantic search instead of keyword matching.

// Traditional keyword search (SQL)
SELECT * FROM documents
WHERE content LIKE '%machine learning%';
// Returns only exact keyword matches

// Vector semantic search
SELECT * FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;
// Returns conceptually similar content, even without exact keywords
// "neural networks", "deep learning", "AI models" all match
Real Example:

When you search "affordable Italian restaurant" in Google Maps, the embedding for your query is compared against millions of restaurant embeddings. Results include places described as "budget-friendly trattoria" or "cheap pasta spot"—semantically similar phrases that keyword search would miss.

How Vector Embeddings Are Generated

Modern embedding models transform text, images, or audio into vectors using deep learning:

// Generate embeddings with OpenAI API
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function generateEmbedding(text) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-large",
    input: text,
    dimensions: 1536  // Can reduce from 3072 for cost savings
  });

  return response.data[0].embedding;  // Array of 1536 floats
}

const embedding = await generateEmbedding("Machine learning for healthcare");
// Result: [-0.0234, 0.0456, -0.0123, ... 1536 total numbers]

Vector Database Landscape: pgvector vs Specialized Solutions

The 2025 vector database ecosystem has split into two camps:

1. PostgreSQL + pgvector (Database-First Approach)

pgvector is a PostgreSQL extension that adds vector data types and similarity search. It's the fastest-growing vector solution, used in 42% of new AI projects.

Advantages:

Limitations:

2. Specialized Vector Databases

Database Best For Key Features Pricing (est.)
Pinecone Managed, scalable production Fully managed, auto-scaling, 99.9% SLA $70/mo (1M vectors)
Weaviate Hybrid search, multi-modal AI GraphQL API, built-in ML models, object storage Self-hosted free, cloud $25+/mo
Milvus Massive scale, self-hosted Billion-scale vectors, GPU acceleration Self-hosted free
Qdrant High performance, Rust-based Fast filtering, payload indexing Self-hosted free, cloud $25+/mo
pgvector Relational data + vectors PostgreSQL integration, SQL queries Database hosting cost only

Building with pgvector: Production Implementation

Here's how to implement vector search in PostgreSQL for a document retrieval system (RAG use case):

Step 1: Install and Enable pgvector

-- Connect to PostgreSQL
psql -U postgres

-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify installation
SELECT * FROM pg_extension WHERE extname = 'vector';

Step 2: Create Schema with Vector Column

-- Create documents table with embeddings
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  embedding vector(1536),  -- OpenAI embedding dimension
  metadata JSONB,           -- Store tags, author, date, etc.
  created_at TIMESTAMP DEFAULT NOW()
);

-- Create index for fast similarity search
-- Options: ivfflat (faster build) or hnsw (faster query)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);

-- Alternative: IVFFlat index (configure lists based on row count)
-- CREATE INDEX ON documents
-- USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100);  -- lists = sqrt(total_rows) recommended

Step 3: Insert Documents with Embeddings

// Node.js application code
import { Pool } from 'pg';
import OpenAI from 'openai';

const pool = new Pool({
  connectionString: process.env.DATABASE_URL
});

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

async function addDocument(title, content, metadata = {}) {
  // 1. Generate embedding
  const embeddingResponse = await openai.embeddings.create({
    model: "text-embedding-3-large",
    input: content,
    dimensions: 1536
  });

  const embedding = embeddingResponse.data[0].embedding;

  // 2. Store in PostgreSQL
  const result = await pool.query(
    `INSERT INTO documents (title, content, embedding, metadata)
     VALUES ($1, $2, $3, $4)
     RETURNING id`,
    [title, content, embedding, JSON.stringify(metadata)]
  );

  return result.rows[0].id;
}

// Example usage
await addDocument(
  "Guide to Machine Learning",
  "Machine learning enables computers to learn from data...",
  { category: "AI", tags: ["ML", "tutorial"] }
);

Step 4: Semantic Search Queries

async function semanticSearch(query, limit = 10) {
  // 1. Generate query embedding
  const queryEmbedding = await openai.embeddings.create({
    model: "text-embedding-3-large",
    input: query,
    dimensions: 1536
  });

  const embedding = queryEmbedding.data[0].embedding;

  // 2. Find similar documents using cosine similarity
  // <=> operator computes cosine distance (1 - cosine_similarity)
  const results = await pool.query(
    `SELECT
       id,
       title,
       content,
       1 - (embedding <=> $1) AS similarity,
       metadata
     FROM documents
     ORDER BY embedding <=> $1
     LIMIT $2`,
    [JSON.stringify(embedding), limit]
  );

  return results.rows;
}

// Search example
const results = await semanticSearch("How does deep learning work?");

results.forEach(doc => {
  console.log(`${doc.title} (${(doc.similarity * 100).toFixed(1)}% match)`);
  console.log(doc.content.substring(0, 150) + "...\n");
});

Advanced: Hybrid Search (Vector + Full-Text)

-- Combine vector similarity with keyword relevance
SELECT
  id,
  title,
  content,
  (0.7 * (1 - (embedding <=> $1))) +  -- 70% weight on semantic similarity
  (0.3 * ts_rank(to_tsvector('english', content), query)) AS score
FROM documents,
     plainto_tsquery('english', $2) query
WHERE to_tsvector('english', content) @@ query  -- Keyword filter
ORDER BY score DESC
LIMIT 10;

-- $1: query embedding vector
-- $2: query text for keyword search
✨ Build AI applications with confidence

Manage pgvector databases visually with SQL Data Builder

SQL Data Builder provides native support for PostgreSQL pgvector extension. Design vector tables, create HNSW indexes, and query embeddings—all through a visual interface without complex SQL.

Try SQL Data Builder Free Sign in
pgvector
Native support
Visual
No SQL needed
Free
Trial available

Performance Benchmarks: Vector Database Comparison

Real-world performance tests on 1 million 1536-dimensional vectors (OpenAI embeddings):

Database Query Latency (p95) Recall @ 10 Index Build Time Memory Usage
pgvector (HNSW) 45ms 95% 18 min 6.2 GB
pgvector (IVFFlat) 120ms 92% 8 min 4.8 GB
Pinecone 35ms 98% N/A (managed) N/A (managed)
Weaviate 42ms 96% 22 min 7.1 GB
Milvus 38ms 97% 15 min 5.9 GB
Qdrant 40ms 96% 12 min 5.4 GB

Key Findings:

Scale Considerations:

pgvector performs exceptionally well up to 10 million vectors. Beyond that, consider partitioning (split by tenant/date) or dedicated vector databases. For 99% of applications, pgvector's integration benefits outweigh the marginal performance gains of specialized solutions.

RAG (Retrieval-Augmented Generation): Production Architecture

RAG combines vector search with LLMs to provide accurate, contextual AI responses. It's the architecture behind ChatGPT's custom GPTs, Notion AI, and GitHub Copilot.

How RAG Works

  1. Indexing: Split documents into chunks, generate embeddings, store in vector DB
  2. Retrieval: User query → embedding → vector search → retrieve relevant chunks
  3. Generation: Inject retrieved chunks into LLM prompt → generate answer

Complete RAG Implementation

// 1. Document chunking and indexing
async function indexDocument(filePath) {
  const content = await fs.readFile(filePath, 'utf-8');

  // Split into 500-token chunks with 50-token overlap
  const chunks = chunkText(content, 500, 50);

  for (let i = 0; i < chunks.length; i++) {
    const embedding = await generateEmbedding(chunks[i]);

    await pool.query(
      `INSERT INTO documents (title, content, embedding, metadata)
       VALUES ($1, $2, $3, $4)`,
      [
        `${path.basename(filePath)} - Chunk ${i + 1}`,
        chunks[i],
        JSON.stringify(embedding),
        JSON.stringify({ source: filePath, chunk: i })
      ]
    );
  }
}

// 2. RAG query function
async function askQuestion(question, conversationHistory = []) {
  // Step 1: Retrieve relevant context
  const relevantDocs = await semanticSearch(question, 5);

  const context = relevantDocs
    .map(doc => doc.content)
    .join('\n\n---\n\n');

  // Step 2: Build prompt with context
  const systemPrompt = `You are a helpful assistant. Answer questions using ONLY the provided context. If the context doesn't contain the answer, say "I don't have enough information to answer that."

Context:
${context}`;

  // Step 3: Generate answer with GPT-4
  const completion = await openai.chat.completions.create({
    model: "gpt-4-turbo-preview",
    messages: [
      { role: "system", content: systemPrompt },
      ...conversationHistory,
      { role: "user", content: question }
    ],
    temperature: 0.7,
    max_tokens: 500
  });

  return {
    answer: completion.choices[0].message.content,
    sources: relevantDocs.map(d => ({ title: d.title, similarity: d.similarity }))
  };
}

// Usage example
const result = await askQuestion(
  "How do I configure HNSW indexes in pgvector?"
);

console.log(result.answer);
console.log('\nSources:', result.sources);

RAG Optimization Techniques

-- Metadata filtering example (filter by date AND semantic similarity)
SELECT
  id,
  title,
  content,
  1 - (embedding <=> $1) AS similarity
FROM documents
WHERE
  metadata->>'category' = 'technical_docs' AND
  created_at >= NOW() - INTERVAL '90 days'
ORDER BY embedding <=> $1
LIMIT 10;

Real-World Use Cases: What Companies Are Building

1. Customer Support Chatbots (RAG)

Example: Intercom, Zendesk AI

2. Semantic Code Search

Example: GitHub Copilot, Sourcegraph Cody

3. Recommendation Systems

Example: Spotify, Netflix

4. E-commerce Visual Search

Example: Pinterest Lens, Google Lens

5. Legal Document Analysis

Example: Harvey AI, Casetext

Distance Metrics Explained: When to Use Each

Vector similarity search uses distance metrics to find "nearby" vectors:

Cosine Similarity (Most Common)

-- pgvector operator: <=>
SELECT * FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;

Euclidean Distance (L2)

-- pgvector operator: <->
SELECT * FROM documents
ORDER BY embedding <-> query_embedding
LIMIT 10;

Inner Product

-- pgvector operator: <#>
SELECT * FROM documents
ORDER BY embedding <#> query_embedding DESC
LIMIT 10;
✨ Trusted by AI developers worldwide

Build production AI apps with PostgreSQL + pgvector

SQL Data Builder makes it easy to design, test, and optimize vector search systems. Visualize embedding distributions, benchmark index performance, and export production-ready schemas—all without writing complex SQL.

Start Building with pgvector View examples
RAG
Ready templates
Fast
HNSW indexing
Scale
10M+ vectors

Index Optimization: HNSW vs IVFFlat

pgvector supports two index types with different performance characteristics:

HNSW (Hierarchical Navigable Small World)

-- Create HNSW index (recommended for most use cases)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Parameters:
-- m: Max connections per node (default 16, higher = better recall, more memory)
-- ef_construction: Size of candidate list (default 64, higher = better index quality)

Characteristics:

IVFFlat (Inverted File with Flat Compression)

-- Create IVFFlat index
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- lists parameter: sqrt(row_count) is a good starting point
-- Example: 1M rows → lists = 1000, 10M rows → lists = 3162

Characteristics:

Query-Time Tuning

-- Increase recall at query time (HNSW)
SET hnsw.ef_search = 100;  -- Default 40, higher = better recall, slower queries

-- Increase recall at query time (IVFFlat)
SET ivfflat.probes = 10;  -- Default 1, higher = more clusters searched

Production Best Practices

1. Dimension Reduction for Cost Savings

// Reduce embedding dimensions (trade accuracy for cost/speed)
const embedding = await openai.embeddings.create({
  model: "text-embedding-3-large",
  input: text,
  dimensions: 1024  // Down from 3072, ~60% cost reduction
});

// Accuracy loss: typically 1-3% for most applications

2. Batch Processing

// Process embeddings in batches (up to 2048 inputs per request)
const texts = [...]; // Array of 1000 texts

const batchSize = 100;
for (let i = 0; i < texts.length; i += batchSize) {
  const batch = texts.slice(i, i + batchSize);

  const embeddings = await openai.embeddings.create({
    model: "text-embedding-3-large",
    input: batch,
    dimensions: 1536
  });

  // Bulk insert to PostgreSQL
  const values = embeddings.data.map((emb, idx) =>
    [batch[idx], JSON.stringify(emb.embedding)]
  );

  await pool.query(
    `INSERT INTO documents (content, embedding)
     SELECT * FROM UNNEST($1::text[], $2::vector[])`,
    [values.map(v => v[0]), values.map(v => v[1])]
  );
}

3. Monitoring and Observability

-- Monitor index performance
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,  -- Number of index scans
  idx_tup_read,  -- Tuples read by index
  idx_tup_fetch  -- Tuples fetched by index
FROM pg_stat_user_indexes
WHERE indexname LIKE '%embedding%';

-- Check index size
SELECT
  pg_size_pretty(pg_relation_size('documents_embedding_idx')) AS index_size;

4. Multi-Tenancy Pattern

-- Partition by tenant for isolation and performance
CREATE TABLE documents (
  id SERIAL,
  tenant_id INTEGER NOT NULL,
  content TEXT,
  embedding vector(1536),
  PRIMARY KEY (tenant_id, id)
) PARTITION BY HASH (tenant_id);

-- Create partitions (8 partitions example)
CREATE TABLE documents_0 PARTITION OF documents FOR VALUES WITH (MODULUS 8, REMAINDER 0);
CREATE TABLE documents_1 PARTITION OF documents FOR VALUES WITH (MODULUS 8, REMAINDER 1);
-- ... create remaining partitions

-- Create index on each partition
CREATE INDEX ON documents_0 USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON documents_1 USING hnsw (embedding vector_cosine_ops);
-- ...

Future of Vector Databases: 2025 Trends

1. Multi-Modal Embeddings

Combine text, image, audio, and video embeddings in single unified search:

2. Sparse-Dense Hybrid Search

Combine sparse vectors (BM25) with dense embeddings for best of both worlds:

3. Graph + Vector Hybrid

Weaviate and Neo4j pioneering vector search over knowledge graphs:

4. Edge Deployment

On-device vector search for privacy and latency:

Conclusion: Choosing the Right Vector Database

Start with pgvector if:

Choose Pinecone if:

Choose Weaviate if:

Choose Milvus/Qdrant if:

The vector database market is rapidly maturing, with pgvector emerging as the pragmatic choice for most developers. Its PostgreSQL integration, zero licensing costs, and growing ecosystem (including visual tools like SQL Data Builder) make it the default starting point for AI applications in 2025.

As embeddings become the universal interface for AI systems—from search to recommendations to generative AI—mastering vector databases is no longer optional. It's the foundational skill for building the next generation of intelligent applications.

Related Articles