Learn what a vector database is, how embeddings enable similarity search, and when to choose pgvector, Pinecone, or Weaviate for AI search and RAG.

A vector database is a system built to store and search embeddings—lists of numbers that represent the “meaning” of text, images, or other data. Instead of asking, “Does this record contain the exact word ?”, you ask, “Which records are to this question?” and get back the closest matches.
Imagine every document (or product, ticket, or FAQ) is turned into a point on a map. Items about the same idea end up near each other—even if they use different words. A vector database is the tool that can quickly answer: what’s nearest to this new point?
Traditional SQL databases are great when you know the structure of your question: filter by date, user_id, status, and so on. Keyword search is great when the right answer literally contains the same words you type.
Vector databases are different because they focus on semantic similarity. They’re designed to handle queries like “How do I get my money back?” and find content that says “Our refund policy…” without requiring the exact same wording.
This doesn’t replace SQL or keyword search. In many real systems, you use both: SQL/filters for business rules (region, permissions, recency) and vector search for “meaning.”
If you remember one line: a vector database is a “most similar items” engine for embeddings, optimized to do that fast and at scale.
Vector databases work because embeddings let you compare meaning numerically. You don’t read the numbers; you use them to rank “how close” two pieces of content are.
An embedding is a list of numbers (often hundreds or thousands long) that represents a piece of content. Each number captures some aspect of meaning learned by a machine‑learning model. You don’t interpret the individual numbers directly; what matters is that similar content ends up with similar number patterns.
Think of it like coordinates on a very high‑dimensional map: sentences about “refund policy” and “returning a product” land near each other, even if they use different words.
Different embedding models turn different media into vectors:
Once everything is a vector, your database can search across large collections using the same core operation: “find the closest vectors.”
To decide what’s “closest,” systems use simple scoring rules:
You don’t need to compute these by hand—the important part is that higher scores mean “more alike.”
Most search quality wins come from better embeddings and better chunking, not swapping databases. If your model doesn’t capture your domain language (product names, internal jargon, legal phrasing), even the best vector index can only return “closest wrong answers.” Choosing pgvector vs Pinecone vs Weaviate matters, but choosing the right embedding model and input format usually matters more.
Keyword search, SQL queries, and vector search solve different problems—mixing them up is a common source of disappointing results.
Traditional search (Elasticsearch, Postgres full-text, etc.) matches words and phrases. It’s great when users know what to type and the document contains those terms.
It struggles when:
A vector database stores embeddings—numeric representations of meaning. Queries are also embedded, and results are ranked by similarity, so you can retrieve conceptually related content even when the exact words don’t match. This is why vector search is popular for semantic search and RAG (retrieval-augmented generation).
SQL is the right tool for:
Vectors are a poor fit when precision is non-negotiable (e.g., “orders for customer_id = 123”).
Even with semantic search, you usually need classic filters—price ranges, dates, language, category, and permissions. Most real systems do a hybrid: SQL/metadata filters first, then vector similarity ranking within the allowed set.
When you store data in a vector database, each item becomes a long list of numbers (an embedding). Searching then means: “find the vectors that are closest to this query vector.”
A realistic database might hold millions of vectors. Comparing your query to every vector would be too slow and too expensive. So vector databases build an index—a structure that helps narrow down the candidates quickly, so the system only measures distances for a small subset.
Most vector search uses approximate nearest neighbor (ANN). “Approximate” means the database tries to find very good matches fast, rather than guaranteeing the mathematically perfect top result every time.
A helpful analogy: instead of checking every book in a library, ANN uses a smart map to walk you to the right shelves first.
This trade-off is usually tuned with settings like “how hard should the index search?”
Practically, recall is “how often the results include what a human would consider the right answers.” For RAG, higher recall often reduces missing key facts (but can cost more).
Different products (pgvector, Pinecone, Weaviate) expose these ideas with different defaults and tuning knobs, but the goal is the same: fast similarity search with controllable accuracy.
A vector database workflow is mostly a “store things, then retrieve the best matches” loop. The key is that you store meaning (embeddings) alongside the original content so search can match ideas, not just exact words.
You start by collecting documents (pages, PDFs, tickets, product descriptions, etc.), splitting them into chunks, and generating an embedding for each chunk.
In the database you typically store:
At search time, you embed the user’s query and ask for the nearest vectors.
Many teams blend vector similarity with keyword scoring (BM25-like) so you get semantic matches and still reward exact terms like SKU codes, names, or error strings.
Before or during retrieval, apply metadata filters—especially for multi-tenant apps and permissions. Filters also help precision (e.g., “only last 90 days,” “only in Help Center”).
A common pattern is: retrieve top 50–200 fast, then re-rank the top 10–20 using a stronger model or rules (freshness boosts, source priority).
For RAG, you take the final top chunks and send them as context to an LLM prompt, often with citations and a “don’t answer if not found” instruction. The result is an answer grounded in your stored content, not the model’s guess.
If your goal is to validate retrieval quality quickly (instead of spending weeks wiring infrastructure), a vibe-coding platform like Koder.ai can help you prototype an end-to-end semantic search or RAG app from a chat interface. In practice, that means you can stand up a React UI, a Go backend, and a Postgres database (including a pgvector-based approach) and iterate using planning mode, snapshots, and rollback—then export the source code when you’re ready.
pgvector is a PostgreSQL extension that lets you store and search embedding vectors directly in your existing database. Instead of running a separate “vector database,” you add a new column type (a vector) to the same tables that already hold your users, products, documents, and metadata.
pgvector shines for teams already committed to Postgres and wanting fewer moving parts. If your app’s truth is in Postgres, keeping vectors there can simplify architecture: one backup strategy, one access-control model, one place to run migrations, and familiar SQL for joins and filtering.
The biggest win is putting structured data and vectors together. You can do a semantic search and still apply “normal” constraints—like tenant_id, category, status, or permissions—without stitching results across systems. Operationally, it can be simpler to ship: your existing Postgres deployment plus an extension.
High-volume vector workloads can push Postgres in ways it wasn’t originally tuned for. You’ll likely need to think about vector indexes (commonly IVFFlat or HNSW), memory settings, vacuum behavior, and query patterns.
If you expect very large embedding collections, heavy concurrent similarity search, or rapid growth, scaling and tuning can become more hands-on than with a managed vector service. For many teams, pgvector is the “start simple” option that can still go surprisingly far.
Pinecone is a fully managed vector database service: you send it embeddings (vectors) plus IDs and metadata, and it gives you fast similarity search with operational work largely handled for you.
With Pinecone, you typically don’t worry about provisioning machines, tuning low-level index settings day to day, or building your own scaling and failover story. You interact with an API to upsert vectors, query for nearest neighbors, and filter results by metadata (for example: language, tenant, document type, or access level).
Pinecone is a strong choice when you need to:
Teams often pick it when the core product depends on high-quality retrieval and they want “vector search as a service” rather than another system to maintain.
Pinecone’s biggest advantage is speed-to-production. Managed scaling and reliability features (varying by plan) reduce the time you spend on capacity planning and incident response. It also tends to integrate cleanly with common AI stacks for search and RAG.
The main trade-offs are vendor lock-in concerns and ongoing usage costs that can rise with query volume, storage, and throughput needs. You’ll also want to confirm data residency, compliance requirements, and how your organization handles sensitive data before committing.
Weaviate is an open-source vector database that gives you a full-featured “AI search backend” with a GraphQL API. If you like the idea of controlling your infrastructure (or deploying on your cloud of choice) but still want a product-like experience—schema, filtering, indexing options, and integrations—Weaviate is often on the shortlist.
At a high level, Weaviate stores objects (your documents, products, tickets, etc.) along with metadata and vector embeddings. You can query it with semantic similarity (“find things like this”) while also applying filters (“only from last 30 days,” “only category = support”). The GraphQL API makes it approachable for teams that want expressive queries without designing a lot of custom endpoints.
Weaviate tends to fit teams that:
Pros: Strong schema/metadata support, a feature-rich ecosystem of modules/integrations, and configurable indexing approaches that let you tune performance.
Cons: If you run it yourself, you’re responsible for operating it—upgrades, scaling, monitoring, backups, and incident response. Also, as you add modules, multi-tenancy, and more complex schemas, the system can become harder to reason about unless you set clear conventions early.
If you’re comparing options, Weaviate often sits between “simple add-on inside your database” and “fully managed service,” offering flexibility at the cost of operational ownership.
Picking a vector database is less about “best” and more about fit: where you want to run it, how big you expect it to get, what your queries look like, and how much operational work your team can take on.
pgvector is “vectors inside Postgres.” It’s ideal if your app already lives on Postgres and you want one database for both business data and embeddings.
Pinecone is managed. You trade control for speed of adoption: fewer knobs, less infrastructure to run.
Weaviate is open-source and can be self-hosted or consumed as a managed offering. It’s a good middle path if you want a vector-native system but still prefer open tooling.
At smaller scales, all three can work well. As you grow, ask:
If you expect rapid growth and high QPS, Pinecone often wins on operational simplicity. If growth is moderate and you already run Postgres at scale, pgvector can be cost-effective.
If you need heavy relational filtering (joins, complex predicates) alongside similarity search, pgvector is compelling.
If you need hybrid search (keyword + semantic), rich filtering, or strong multi-tenant isolation, compare Pinecone and Weaviate feature-by-feature.
Be honest about backups, monitoring, upgrades, and on-call load. Managed reduces burden. Self-hosted can be cheaper, but only if your team has the skills (and time) to run it reliably.
Good vector search starts with a boring but reliable record shape. Treat every “searchable unit” as a row/object that can be fetched, filtered, and explained later.
At a minimum, store:
This keeps retrieval simple: vector search returns ids, then you fetch the chunk + context to show users or feed RAG.
Chunking is the biggest quality lever you control. Smaller chunks are more “precise” but can miss context; larger chunks carry context but dilute the signal.
A common starting point is 200–400 tokens with 10–20% overlap, then adjust based on your content. For APIs and legal text, smaller chunks often work better; for narratives, slightly larger chunks tend to preserve meaning.
Store metadata you will actually query:
Avoid dumping huge JSON blobs; keep frequently-filtered fields easy to index.
Embeddings aren’t timeless. Track embedding_model, model_version, and chunking_version (plus created_at). When you upgrade models, you can re-embed in parallel and gradually switch traffic without mixing incompatible vectors.
Vector search can feel “instant” in a demo, then get slower or more expensive in production. The good news: the main drivers are predictable, and you can manage them whether you use pgvector in Postgres, Pinecone, or Weaviate.
Most teams underestimate the non-search parts.
Better similarity search doesn’t automatically mean better answers.
Create a small test set: 30–100 real queries, each with a few “good” expected results. Measure relevance (hit rate in top-k) and track changes when you tweak chunking, indexes, or prompts.
Treat embeddings as potentially sensitive.
Vector search quality isn’t just about indexes—it’s also about how you operate the system day to day. A few governance habits prevent “mystery results” and make audits far less stressful.
If your documents contain sensitive data, consider keeping the raw content in your primary datastore (object storage, database, DMS) and storing only:
This reduces exposure if the vector store is compromised and simplifies access control. It also helps when you use multiple backends (e.g., pgvector for internal apps, Pinecone for a public feature).
Embeddings can “remember” old text if you don’t clean them up.
Log enough to debug relevance without logging secrets:
This makes drift and regressions obvious after model or data changes.
Plan for retention (how long vectors and logs live), encryption in transit/at rest, and audit needs (who searched what, when). If you operate in regulated environments, document data flows and access paths so reviews don’t block releases.
Even a solid vector database setup can disappoint if a few common pitfalls sneak in. Here are the ones that show up most often—and how to fix them early.
Vectors are great for “meaning,” not for hard constraints. If you use semantic search as the only tool, results can feel random or unsafe.
Avoid it: combine similarity search with structured filters (tenant_id, product category, language, date ranges). Treat metadata filtering as a first-class part of query design, not an afterthought.
A demo that looks good on a handful of prompts can hide serious recall and relevance issues.
Avoid it: build a small evaluation set of real queries with “good answer” targets. Track simple metrics over time (top-k relevance, click/selection rate, or human judgments). Re-run evaluations whenever you change embeddings, chunking, or indexing settings.
Embedding models evolve. Switching models (or even versions) changes vector space, which can silently degrade retrieval.
Avoid it: store an embedding_model field and treat embeddings as a versioned artifact. Keep a re-embedding pipeline and plan for backfills (often done incrementally). If cost is a concern, re-embed the most-used content first.
If your app has access control, retrieval must respect it—otherwise you can surface restricted content.
Avoid it: enforce permissions in the retrieval step using per-tenant indexes, metadata filters, or precomputed ACL fields. Verify this with tests: “user A must never retrieve user B’s documents,” even in top-k candidates.
A vector database is a system designed to store embeddings (numeric representations of text, images, or other data) and quickly retrieve the most similar items. It fits best when users search by meaning (semantic search) or when you’re building RAG (retrieval-augmented generation) so an AI assistant can pull relevant passages from your own content before answering.
Here are practical rules of thumb:
Build a tiny proof of concept in a day:
If you want more implementation and cost guidance, see /blog. For pricing considerations or hosted options, check /pricing.
A vector database stores and searches embeddings (vectors: long lists of numbers) that represent the meaning of text, images, or other data. Instead of matching exact words, it returns items that are most similar to a query in semantic space—useful when people phrase the same intent in different ways.
An embedding is a numerical “fingerprint” of content produced by an ML model. You don’t interpret each number; you use the whole vector to compare items. Similar items (e.g., “refund policy” and “return a product”) end up near each other, enabling semantic retrieval.
Keyword search matches words and phrases (often great for exact terms). Vector search matches meaning (great for synonyms and paraphrases). In practice, teams often use hybrid search:
SQL is best for structured, exact questions: IDs, joins, aggregations, and strict filters. Vector search is best for fuzzy “find similar” questions. A common pattern is:
Most systems use Approximate Nearest Neighbor (ANN) indexing. Rather than comparing your query vector to every stored vector, the index narrows candidates so only a small subset gets fully scored. You trade a bit of “perfect best result” for big gains in latency and cost.
Cosine similarity compares vector direction (are they pointing the same way?). Dot product rewards similar direction and can also incorporate magnitude depending on how embeddings are produced/normalized.
Practically: pick the metric recommended for your embedding model and stick to it consistently during indexing and querying.
Chunking controls what each vector represents. Too large: you retrieve noisy, mixed-topic context. Too small: you lose important context.
A practical starting point:
Then adjust by content type (APIs/legal often smaller; narratives often larger).
RAG is typically a pipeline:
Choose based on deployment and ops tolerance:
Common pitfalls include: