Compare major database types—relational, columnar, document, graph, vector, key-value, and more—with use cases, tradeoffs, and tips to choose well.

A “database type” isn’t just a label—it’s shorthand for how a system stores data, how you query it, and what it’s optimized to do. That choice directly affects speed (what’s fast vs. slow), cost (hardware or cloud spend), and capabilities (transactions, analytics, search, replication, and more).
Different database types make different tradeoffs:
Those design choices influence:
This article walks through the major types of databases and explains, for each one:
Many modern products blur the lines. Some relational databases add JSON support that overlaps with a document database. Some search and analytics platforms offer vector indexing like a vector database. Others combine streaming and storage with time-series features.
So “type” isn’t a strict box—it’s still useful as a way to understand default strengths and the kinds of workloads a database handles best.
Start with your main workload:
Then use the “How to Choose the Right Database Type” section to narrow it down based on scale, consistency needs, and the queries you’ll run most often.
Relational databases are what many people picture when they hear “database.” Data is organized into tables made of rows (records) and columns (fields). A schema defines what each table looks like—what columns exist, what types they have, and how tables relate to each other.
Relational systems are typically queried with SQL (Structured Query Language). SQL is popular because it’s readable and expressive:
WHERE, ORDER BY).JOIN).GROUP BY).Most reporting tools, analytics platforms, and business apps speak SQL, which makes it a safe default when you want broad compatibility.
Relational databases are known for ACID transactions, which help keep data correct:
This matters when mistakes are costly—like double-charging a customer or losing a stock update.
A relational database is usually the right fit for structured, well-defined data and workflows such as:
The same structure that makes relational databases reliable can add friction:
When your data model changes constantly—or you need extreme horizontal scale with simpler access patterns—other database types may be a better match.
Columnar databases store data “by column” rather than “by row.” That one change has a big impact on speed and cost for analytics workloads.
In a traditional row-store (common in a relational database), all the values for a single record sit together. That’s great when you frequently fetch or update one customer/order at a time.
In a column-store (a columnar database), all values for the same field sit together—every price, every country, every timestamp. This makes it efficient to read only the few columns needed for a report, without pulling entire rows from disk.
Analytics and BI queries often:
SUM, AVG, COUNT, and group by dimensionsColumnar storage accelerates these patterns because it reads less data and compresses extremely well (similar values clustered together compress nicely). Many columnar engines also use vectorized execution and smart indexing/partitioning to speed up large scans.
Columnar systems shine for dashboards and reporting: “revenue by week,” “top 20 products by region,” “conversion rate by channel,” or “errors by service over the last 30 days.” These queries touch many rows but relatively few columns.
If your workload is mostly “get one record by ID” or “update a single row dozens of times per second,” columnar can feel slower or more expensive. Writes are often optimized for batches (append-heavy ingestion) rather than frequent, tiny updates.
Columnar databases are a strong fit for:
If your priority is fast aggregations across lots of data, columnar is usually the first database type to evaluate.
Document databases store data as “documents”—self-contained records that look a lot like JSON. Instead of splitting information across many tables, you typically keep related fields together in one object (including nested arrays and sub-objects). That makes them a natural fit for application data.
A document might represent a user, a product, or an article—complete with attributes that can differ from one document to the next. One product can have size and color, another can have dimensions and materials, without forcing a single rigid schema for all records.
This flexibility is especially helpful when your requirements change frequently or when different items have different sets of fields.
To avoid scanning every document, document databases use indexes—data structures that help the database quickly locate matching documents for a query. You can index common lookup fields (like email, sku, or status), and many systems can also index nested fields (like address.city). Indexes speed up reads but add overhead to writes, because the index must be updated when documents change.
Document databases shine with evolving schemas, nested data, and API-friendly payloads. The tradeoffs usually show up when you need:
They’re a strong choice for content management, product catalogs, user profiles, and backend APIs—anywhere your data maps cleanly to “one object per page/screen/request.”
Key-value stores are the simplest database model: you store a value (anything from a string to a JSON blob) and retrieve it using a unique key. The core operation is basically “give me the value for this key,” which is why these systems can be extremely fast.
Because reads and writes are centered on a single primary key, key-value stores can be optimized for low latency and high throughput. Many are designed to keep hot data in memory, minimize complex query planning, and scale horizontally.
This simplicity also shapes how you model data: instead of asking the database to “find all users in Berlin who signed up last week,” you usually design keys that already point to the exact record you want (for example, user:1234:profile).
Key-value stores are widely used as a cache in front of a slower primary database (like a relational database). If your app repeatedly needs the same data—product details, user permissions, pricing rules—caching the result by key avoids recomputing or re-querying.
They’re also a natural fit for session storage (e.g., session:<id> -> session data) because sessions are read and updated frequently, and they expire automatically.
Most key-value stores support a TTL (time to live) so data can expire without manual cleanup—ideal for sessions, one-time tokens, and rate limit counters.
When memory is limited, systems often use eviction policies (like least-recently-used) to remove old entries. Some products are memory-first, while others can persist data to disk for durability. Choosing between memory and disk typically comes down to whether you’re optimizing for speed (memory) or retention/recovery (disk or persistence).
Key-value stores shine when you already know the key. They’re less suited when your questions are open-ended.
Many have limited query patterns compared to SQL databases. Support for secondary indexes (querying by fields inside the value) varies: some provide it, some provide partial options, and others encourage you to maintain your own lookup keys.
Key-value stores are a great fit for:
If your access pattern is “fetch/update by ID” and latency matters, a key-value store is often the simplest way to get reliable speed.
Wide-column databases (sometimes called wide-column stores) organize data into column families. Instead of thinking in terms of one fixed table with the same columns for every row, you group related columns together and can store different sets of columns per row within a family.
Despite the similar names, wide-column databases are not the same as a columnar database used for analytics.
A columnar database stores each column separately to scan huge datasets efficiently (great for reporting and aggregates). A wide-column database is built for operational workloads at very large scale, where you need to write and read lots of records quickly across many machines.
Wide-column systems are designed for:
The most common pattern is:
This makes them a strong fit for time-ordered data and append-heavy workloads.
With wide-column databases, data modeling is query-driven: you usually design tables around the exact queries you need to run. That can mean duplicating data in different shapes to support different access patterns.
They also tend to offer limited joins and fewer ad-hoc query options than a relational database. If your application relies on complex relationships and flexible querying, you may feel constrained.
Wide-column databases are often used for IoT events, messaging and activity streams, and other large-scale operational data where fast writes and predictable key-based reads matter more than rich relational queries.
Graph databases store data the way many real systems behave: as things connected to other things. Instead of forcing relationships into tables and join tables, the connections are part of the model.
A graph typically has:
This makes it natural to represent networks, hierarchies, and many-to-many relationships without contorting your schema.
Relationship-heavy queries often require many joins in a relational database. Each additional join can add complexity and cost as your data grows.
Graph databases are designed for traversals—walking from one node to connected nodes, then to their connections, and so on. When your questions routinely look like “find connected things within 2–6 steps,” traversals can stay fast and readable even as the network expands.
Graph databases shine for:
Graphs can be a shift for teams: data modeling is different, and query languages (often Cypher, Gremlin, or SPARQL) may be new. You’ll also want clear conventions for relationship types and direction to keep the model maintainable.
If your relationships are simple, your queries are mostly filtering/aggregations, and a handful of joins covers the “connected” parts, a relational database may remain the most straightforward choice—especially when transactions and reporting are already working well.
Vector databases are designed for a specific kind of question: “Which items are most similar to this one?” Instead of matching exact values (like an ID or a keyword), they compare embeddings—numeric representations of content (text, images, audio, products) produced by AI models. Items with similar meaning tend to have embeddings that end up close together in a multi-dimensional space.
A normal search might miss results if the wording is different (“laptop sleeve” vs. “notebook case”). With embeddings, similarity is based on meaning, so the system can surface relevant results even when the exact words don’t match.
The main operation is nearest neighbor search: given a query vector, retrieve the closest vectors.
In real apps, you usually combine similarity with filters, such as:
This “filter + similarity” pattern is how vector search becomes practical for real datasets.
Common uses include:
Vector search relies on specialized indexes. Building and updating those indexes can take time, and they can use significant memory. You’ll also often choose between higher recall (finding more of the true best matches) and lower latency (faster responses).
Vector databases rarely replace your main database. A common setup is: store the “source of truth” (orders, users, documents) in a relational database or document database, and store embeddings + search indexes in a vector database—then join results back to the primary store for full records and permissions.
Time-series databases (TSDBs) are designed for data that arrives continuously and is always tied to a timestamp. Think of CPU usage every 10 seconds, API latency for each request, sensor readings every minute, or stock prices changing multiple times per second.
Most time-series records combine:
This structure makes it easy to ask questions like “show error rate by service” or “compare latency across regions.”
Because the data volume can grow quickly, TSDBs typically focus on:
These features keep storage and query costs predictable without constant manual cleanup.
TSDBs shine when you need time-based calculations, such as:
Typical use cases include monitoring, observability, IoT/sensors, and financial tick data.
The tradeoff: TSDBs are not the best choice for complex, ad-hoc relationships across many entities (for example, deeply nested joins like “users → teams → permissions → projects”). For that, a relational or graph database is usually a better fit.
A data warehouse is less a single “type of database” and more a workload + architecture: many teams querying large historical data to answer business questions (revenue trends, churn, inventory risk). You can buy it as a managed product, but what makes it a warehouse is how it’s used—centralized, analytical, and shared.
Most warehouses accept data in two common ways:
Warehouses are usually optimized for analytics with a few practical tricks:
Once multiple departments rely on the same numbers, you’ll need access control (who can see what), audit trails (who queried/changed data), and lineage (where a metric came from and how it was transformed). This is often as important as query speed.
A lakehouse blends warehouse-style analytics with a data lake’s flexibility—useful when you want one place for both curated tables and raw files (logs, images, semi-structured events), without duplicating everything. It’s a good fit when data volume is high, formats vary, and you still need SQL-friendly reporting.
Choosing among database types is less about “best” and more about fit: what you need to query, how quickly, and what happens when parts of the system fail.
A quick rule of thumb:
Relational databases often shine for OLTP; columnar systems, warehouses, and lakehouses are commonly used for OLAP.
When a network hiccup splits your system, you typically can’t have all three at once:
Many distributed databases choose to stay available during issues and reconcile later (eventual consistency). Others prioritize strict correctness, even if that means refusing some requests until things are healthy.
If many users update the same data, you need clear rules. Transactions bundle steps into “all-or-nothing.” Locking and isolation levels prevent conflicts, but can reduce throughput; looser isolation improves speed but may allow anomalies.
Plan for backups, replication, and disaster recovery early. Also consider how easy it is to test restores, monitor lag, and perform upgrades—these day-two details often matter as much as query speed.
Choosing between the major types of databases is less about what’s trendy and more about what you need to do with your data. A practical way to start is to work backward from your queries and workloads.
Write down the top 5–10 things your app or team must do:
This narrows the options faster than any feature checklist.
Use this quick “shape” checklist:
Performance targets define architecture. Set rough numbers (p95 latency, reads/writes per second, data retention). Cost usually follows:
Many teams use two databases: one for operations (e.g., relational) and one for analytics (e.g., columnar/warehouse). The “right” choice is the one that makes your most important queries simplest, fastest, and cheapest to run reliably.
If you’re prototyping or shipping new features fast, the database decision is often coupled to your development workflow. Platforms like Koder.ai (a vibe-coding platform that generates web, backend, and mobile apps from chat) can make this more concrete: for example, Koder.ai’s default backend stack uses Go + PostgreSQL, which is a strong starting point when you need transactional correctness and broad SQL tooling.
As your product grows, you can still add specialized databases (like a vector database for semantic search or a columnar warehouse for analytics) while keeping PostgreSQL as the system of record. The key is to start with the workloads you must support today—and keep the door open for “add a second store” when the query patterns demand it.
A “database type” is shorthand for three things:
Picking the type is really picking defaults for performance, cost, and operational complexity.
Start from your top 5–10 queries and write patterns, then map them to the right strengths:
Relational databases are a strong default when you need:
They can become painful when you’re doing constant schema changes, or when you need extreme horizontal scale with lots of join-heavy queries spread across shards.
ACID is a reliability guarantee for multi-step changes:
It matters most for workflows where mistakes are expensive (payments, bookings, inventory updates).
Columnar databases are best when queries:
SUM, COUNT, AVG, )A document database is a good fit when:
Watch for tradeoffs around complex joins, duplicated data for read performance, and the performance cost of multi-document transactions.
Use a key-value store when your access pattern is mostly:
Plan around limitations: ad-hoc querying is usually weak, and secondary indexing support varies—often you design keys and additional lookup keys yourself.
Despite the similar name, they target different workloads:
Wide-column systems typically require query-driven modeling (design tables around the exact access patterns) and don’t aim to be flexible like SQL with joins.
Use a graph database when your core questions are about relationships, like:
Graphs excel at traversals (walking relationships) where a relational approach would need many joins. The tradeoff is adopting new modeling conventions and query languages (often Cypher/Gremlin/SPARQL).
A vector database is designed for similarity search over embeddings (numeric representations of meaning). It’s commonly used for:
In practice, it’s usually paired with a relational/document store: keep the source-of-truth data there, store embeddings + vector indexes in the vector DB, then join results back for full records and permissions.
| Primary use case | Best fit (often) | Why |
|---|
| Transactions, invoices, user accounts | Relational (SQL) | Strong constraints, joins, consistency |
| App data with evolving fields | Document | Flexible schema, natural JSON |
| Real-time caching/session state | Key-value store | Fast lookups by key |
| Clickstreams/metrics over time | Time-series | High ingest + time-based queries |
| BI dashboards, large aggregations | Columnar | Fast scans + compression |
| Social/knowledge relationships | Graph | Efficient relationship traversal |
| Semantic search, RAG retrieval | Vector | Similarity search over embeddings |
| Massive operational data at scale | Wide-column | Horizontal scaling, predictable queries |
If you do both OLTP and analytics, plan for two systems early (operational DB + analytics DB).
GROUP BYThey’re often less ideal for OLTP-style workloads like frequent small updates or “fetch one record by ID” patterns, which row-stores tend to handle more naturally.