Vector Databases & Approximate Nearest Neighbors (ANN)

What Is Vector Databases & Approximate Nearest Neighbors (ANN)?

A vector database is a system designed to store and search vectors (embeddings) efficiently. If embeddings are “coordinates for meaning,” then a vector database is the thing that can quickly answer: “Which stored items are closest to this query in meaning-space?”

Analogy: imagine a gigantic museum warehouse with millions of objects. Each object has a “location” not in a building, but on a concept map: all “refund policy” things are near each other, “neural networks” things form another cluster, and “Finland winter photography tips” sits somewhere else. When a visitor asks a question, you want to find the closest objects on this map—fast.

That “find the closest” problem is called nearest neighbor search.

Now the catch: if you have 10 million vectors, the simplest exact approach is to compute the distance from the query to every vector and pick the top results. That’s accurate… and often too slow.

So most real systems use Approximate Nearest Neighbor (ANN) search: you accept results that are very likely to be the true nearest neighbors, but not mathematically guaranteed—because the speedup is enormous.

So the combined concept is:

Vector databases = storage + indexing + metadata + filtering + fast similarity search for embeddings.
ANN = the set of indexing/search techniques that make similarity search practical at large scale.

Why Does It Matter?

Because embeddings are only useful if you can retrieve relevant items quickly.

This matters in practice for three big reasons:

RAG lives or dies on retrieval. Retrieval-Augmented Generation needs to fetch the best chunks for the LLM. If retrieval is slow or inaccurate, the assistant becomes slow or wrong.
Scale breaks “just compute everything.” Exact search requires comparing against every vector. That’s fine for thousands of vectors. It becomes painful for millions and brutal for billions.
You need more than similarity: you need systems behavior. Real apps need:
- filters (“only documents you’re allowed to see”),
- freshness constraints (“updated after 2025”),
- multi-tenant isolation,
- high write throughput (new docs constantly),
- monitoring and evaluation.

Vector databases bundle these concerns around vector search, instead of making you duct-tape them onto a custom script.

How It Works

Let’s build the mental model from “slow but simple” to “fast and approximate.”

1) The basic query: “What’s close to this vector?”

You embed a query into a vector q and compare it to stored vectors d₁…dₙ.

Similarity is often computed using cosine similarity:

$\cos(\theta)=\frac{\mathbf{q}\cdot \mathbf{d}}{|\mathbf{q}||\mathbf{d}|}$

Intuition: treat vectors like arrows. Cosine similarity measures how aligned the arrows are (same direction = similar meaning).

2) Why exact search gets slow

Exact k-nearest neighbors (kNN) means scoring every stored vector.

If you have:

$N$ vectors,
each vector has dimension $d$ ,

then a naive scan costs roughly $O(N \cdot d)$ operations per query.

At 10 million vectors and d=768, even optimized code starts to sweat—especially if you need low latency.

3) ANN idea: don’t search everything, search the right places

ANN methods speed things up by using an index that narrows the search.

There are several common strategies (vector databases often support multiple):

A) Partitioning / “inverted file” (IVF)

Think “neighborhoods.”

Cluster your vectors into groups (often via k-means).
Store which vectors belong to which cluster.
For a query, find the nearest cluster centroids.
Search only inside those clusters.

Intuition: you don’t search every house in the country; you first drive to the right neighborhood.

This trades a small accuracy risk for huge speed improvements.

B) Graph-based search (HNSW)

Think “a road network.”

Vectors are nodes in a graph, connected to other nearby vectors. To search:

Start at an entry point.
Greedily move to neighbors that get closer to the query.
Keep exploring a small candidate set.

HNSW adds a hierarchy (multiple layers) so you can “zoom out” for long jumps and “zoom in” for local precision—like highways + local streets.

Graph ANN methods are popular because they often deliver excellent recall (quality) at low latency.

C) Compression (Product Quantization / PQ)

Think “store a cheaper sketch of each vector.”

Instead of storing full-precision vectors, you store a compressed code that approximates them. That:

reduces memory,
speeds up distance computations,
sometimes enables searching larger-than-RAM collections.

This is especially helpful at very large scales, but it can reduce accuracy if over-compressed.

D) Hybrid approaches

Real systems often combine these:

Partition to reduce the search region,
Graph search inside partitions,
Compression to fit more vectors and compute faster.

4) The vector database wraps the index in “app features”

Beyond the ANN algorithm, a vector database typically provides:

Upserts: insert/update vectors and their metadata.
Metadata filtering: “only results where customerId = X” or “docType = policy”.
Namespaces / multi-tenancy: isolate different users/teams.
Replication & durability: don’t lose your index when a node dies.
Operational knobs: configure recall/latency trade-offs.

A good mental model: ANN is the engine; the vector database is the car (engine + steering + brakes + seatbelts + dashboard).

5) One concrete end-to-end workflow

Chunk documents.
Compute embeddings for each chunk.
Store {embedding, text, metadata} in the vector DB.
Query arrives → embed it.
Run ANN search → get top-k chunks.
(Optional) rerank the top-k with a stronger model.
Use retrieved chunks for RAG, “related articles,” or recommendations.

Key Terminology

Vector database: A system for storing embeddings plus metadata and performing fast similarity search, often with filtering and operational features.
kNN (k-nearest neighbors): The exact top-k closest vectors to a query (accurate, often too slow at scale).
ANN (Approximate Nearest Neighbors): Methods that return “near enough” neighbors much faster by using an index and heuristics.
Recall@k: A quality metric: out of the true nearest k items, what fraction did your ANN return?
Index / indexing method: The data structure enabling fast search (e.g., IVF partitions, HNSW graphs, PQ compression).

Real-World Applications

RAG for internal knowledge assistants: Retrieve the best policy paragraphs or engineering docs before generating an answer.
Semantic search on websites: “Search by meaning” across articles, FAQs, documentation, or notes.
Recommendations: Similar products, similar support tickets, similar code snippets.
Deduplication / near-duplicate detection: Find “this looks like something we already have.”
Anomaly detection: Items far from any cluster can indicate unusual events or data problems.

Common Misconceptions

“ANN means the results are basically wrong.” ANN means “fast with a controllable trade-off.” With good indexes and tuning, results can be extremely close to exact while staying low-latency. You choose where you sit on the speed–quality curve.
“A vector database is just Postgres with a vector column.” You can store vectors in general databases, but vector databases focus on specialized ANN indexes, filtering, scaling, and operational concerns that become painful as data grows.
“If semantic search is good, filters don’t matter.” Filters are non-negotiable in real apps: permissions, tenancy, document type, freshness, and lifecycle rules. A system that retrieves “great chunks you’re not allowed to see” is not great.