Types of Databases: Relational, Columnar, Document, Graph & More

Q: What does “database type” actually mean in practice?

A “database type” is shorthand for three things: - Data model (tables, documents, key-value pairs, graphs, vectors, time-stamped points) - Query patterns it’s optimized for (joins, scans/aggregations, traversals, similarity search) - Scaling and consistency tradeoffs (scale-up vs. scale-out, strict vs. eventual consistency) Picking the type is really picking defaults for performance, cost, and operational complexity.

Q: How do I choose the right database type without overthinking it?

Start from your top 5–10 queries and write patterns , then map them to the right strengths: - OLTP transactions + structured data → relational (SQL) - Dashboards and large aggregations → columnar / warehouse - Evolving JSON-shaped app data → document - Deep relationship queries → graph - Semantic search / RAG retrieval → vector - Get/set by ID with very low latency → key-value If you do both OLTP and analytics, plan for two systems early (operational DB + analytics DB).

Q: When should I use a relational (SQL) database?

Relational databases are a strong default when you need: - Structured, well-defined schemas - ACID transactions (correctness for money, inventory, orders) - Joins and constraints (foreign keys, consistent relationships) They can become painful when you’re doing constant schema changes, or when you need extreme horizontal scale with lots of join-heavy queries spread across shards.

Q: What are ACID transactions, and when do they matter most?

ACID is a reliability guarantee for multi-step changes: - Atomicity : all steps succeed or none do - Consistency : rules/constraints stay valid - Isolation : concurrent operations don’t corrupt each other - Durability : committed data survives crashes It matters most for workflows where mistakes are expensive (payments, bookings, inventory updates).

Q: Why are columnar databases faster for analytics than row-stores?

Columnar databases are best when queries: - Scan lots of rows - Read only a few columns - Compute aggregates ( , , , ) They’re often less ideal for OLTP-style workloads like frequent small updates or “fetch one record by ID” patterns, which row-stores tend to handle more naturally.

Q: When does a document database make more sense than SQL?

A document database is a good fit when: - Your app data maps to JSON-like objects (profiles, catalogs, content) - The shape changes frequently or varies by item - You want to store nested structures without splitting across many tables Watch for tradeoffs around complex joins, duplicated data for read performance, and the performance cost of multi-document transactions.

Q: What are key-value stores best used for (beyond caching)?

Use a key-value store when your access pattern is mostly: - Get/set by a single key (low-latency lookups) - Caching results from a primary database - Sessions , rate limiting , feature flags , or shopping carts Plan around limitations: ad-hoc querying is usually weak, and secondary indexing support varies—often you design keys and additional lookup keys yourself.

Q: What’s the difference between columnar databases and wide-column databases?

Despite the similar name, they target different workloads: - Columnar databases : analytics (fast scans + compression across columns) - Wide-column databases : large-scale operational storage (high write throughput, predictable key-based reads) Wide-column systems typically require query-driven modeling (design tables around the exact access patterns) and don’t aim to be flexible like SQL with joins.

Q: What problem do vector databases solve, and do they replace my main database?

A vector database is designed for similarity search over embeddings (numeric representations of meaning). It’s commonly used for: - Semantic search (find relevant docs even with different wording) - RAG retrieval before an LLM answers - Similarity-based recommendations In practice, it’s usually paired with a relational/document store: keep the source-of-truth data there, store embeddings + vector indexes in the vector DB, then join results back for full records and permissions.

Types of Databases: Relational, Columnar, Document, Graph & More | Koder.ai

What “Database Types” Really Means

A “database type” isn’t just a label—it’s shorthand for how a system stores data, how you query it, and what it’s optimized to do. That choice directly affects speed (what’s fast vs. slow), cost (hardware or cloud spend), and capabilities (transactions, analytics, search, replication, and more).

Why the “type” matters

Different database types make different tradeoffs:

A relational database is great when your data is structured and you need reliable transactions.
A columnar database shines when you’re scanning lots of rows to answer analytical questions.
A document database can move faster when your app’s data shape changes often.
A graph database is built for relationship-heavy data.
A vector database focuses on “similarity” rather than exact matches.

Those design choices influence:

Query patterns: Many tiny lookups, complex joins, or large analytical scans?
Scale model: Scale up one big machine, or scale out across many?
Data model: Tables, documents, key-value pairs, graphs, vectors, or time-stamped points.

What you’ll learn in this guide

This article walks through the major types of databases and explains, for each one:

What it’s best at (and where it struggles)
Typical use cases in real products
Key tradeoffs that affect performance, cost, and complexity

A quick note on “multi-model” systems

Many modern products blur the lines. Some relational databases add JSON support that overlaps with a document database. Some search and analytics platforms offer vector indexing like a vector database. Others combine streaming and storage with time-series features.

So “type” isn’t a strict box—it’s still useful as a way to understand default strengths and the kinds of workloads a database handles best.

How to use this guide to shortlist options

Start with your main workload:

If you need structured data and transactions, begin with a relational database.
If you’re doing heavy reporting and dashboards, look at a columnar database or warehouse.
If your app data changes shape often, consider a document database.
If you need extremely fast lookups by key, a key-value store is a strong candidate.

Then use the “How to Choose the Right Database Type” section to narrow it down based on scale, consistency needs, and the queries you’ll run most often.

Relational Databases (SQL): The Default for Structured Data

Relational databases are what many people picture when they hear “database.” Data is organized into tables made of rows (records) and columns (fields). A schema defines what each table looks like—what columns exist, what types they have, and how tables relate to each other.

Why SQL is everywhere

Relational systems are typically queried with SQL (Structured Query Language). SQL is popular because it’s readable and expressive:

You can filter and sort data (WHERE, ORDER BY).
Combine data across tables (JOIN).
Summarize results (GROUP BY).

Most reporting tools, analytics platforms, and business apps speak SQL, which makes it a safe default when you want broad compatibility.

ACID transactions, in plain language

Relational databases are known for ACID transactions, which help keep data correct:

Atomicity: a multi-step change is “all or nothing.”
Consistency: rules (like foreign keys) stay true after changes.
Isolation: simultaneous updates don’t corrupt each other.
Durability: once saved, data survives crashes.

This matters when mistakes are costly—like double-charging a customer or losing a stock update.

Best-fit workloads

A relational database is usually the right fit for structured, well-defined data and workflows such as:

Business applications (CRM/ERP-like systems)
Finance, payments, billing
Inventory, orders, reservations

Common pitfalls to watch

The same structure that makes relational databases reliable can add friction:

Rigid schemas: frequent changes to data shape can require migrations.
Join-heavy scaling: lots of joins across large tables can become slow or expensive at high scale, especially if the data is spread across many machines.

When your data model changes constantly—or you need extreme horizontal scale with simpler access patterns—other database types may be a better match.

Columnar Databases: Built for Analytics

Columnar databases store data “by column” rather than “by row.” That one change has a big impact on speed and cost for analytics workloads.

Row-store vs. column-store

In a traditional row-store (common in a relational database), all the values for a single record sit together. That’s great when you frequently fetch or update one customer/order at a time.

In a column-store (a columnar database), all values for the same field sit together—every price, every country, every timestamp. This makes it efficient to read only the few columns needed for a report, without pulling entire rows from disk.

Why columnar is fast for reporting

Analytics and BI queries often:

Scan lots of records
Select a small set of columns
Compute aggregates like SUM, AVG, COUNT, and group by dimensions

Columnar storage accelerates these patterns because it reads less data and compresses extremely well (similar values clustered together compress nicely). Many columnar engines also use vectorized execution and smart indexing/partitioning to speed up large scans.

Typical query patterns

Columnar systems shine for dashboards and reporting: “revenue by week,” “top 20 products by region,” “conversion rate by channel,” or “errors by service over the last 30 days.” These queries touch many rows but relatively few columns.

Tradeoffs: OLTP-style updates and point lookups

If your workload is mostly “get one record by ID” or “update a single row dozens of times per second,” columnar can feel slower or more expensive. Writes are often optimized for batches (append-heavy ingestion) rather than frequent, tiny updates.

Where it shines

Columnar databases are a strong fit for:

BI and executive dashboards
Event and clickstream analytics
Large-scale reporting on logs or transactions

If your priority is fast aggregations across lots of data, columnar is usually the first database type to evaluate.

Document Databases: Flexible Schemas for App Data

Document databases store data as “documents”—self-contained records that look a lot like JSON. Instead of splitting information across many tables, you typically keep related fields together in one object (including nested arrays and sub-objects). That makes them a natural fit for application data.

The document model (JSON-like records)

A document might represent a user, a product, or an article—complete with attributes that can differ from one document to the next. One product can have size and color, another can have dimensions and materials, without forcing a single rigid schema for all records.

This flexibility is especially helpful when your requirements change frequently or when different items have different sets of fields.

Indexing, at a high level

To avoid scanning every document, document databases use indexes—data structures that help the database quickly locate matching documents for a query. You can index common lookup fields (like email, sku, or status), and many systems can also index nested fields (like address.city). Indexes speed up reads but add overhead to writes, because the index must be updated when documents change.

Strengths—and the tradeoffs

Document databases shine with evolving schemas, nested data, and API-friendly payloads. The tradeoffs usually show up when you need:

Complex joins across many entities (often less natural than in a relational database)
Multi-document transactions at high scale (supported in many products, but may cost performance)
Strict normalization (teams sometimes duplicate data to keep reads simple, which requires careful update logic)

Common use cases

They’re a strong choice for content management, product catalogs, user profiles, and backend APIs—anywhere your data maps cleanly to “one object per page/screen/request.”

Key-Value Stores: Simple and Very Fast Lookups

Key-value stores are the simplest database model: you store a value (anything from a string to a JSON blob) and retrieve it using a unique key. The core operation is basically “give me the value for this key,” which is why these systems can be extremely fast.

The key-value model (and why it’s fast)

Because reads and writes are centered on a single primary key, key-value stores can be optimized for low latency and high throughput. Many are designed to keep hot data in memory, minimize complex query planning, and scale horizontally.

This simplicity also shapes how you model data: instead of asking the database to “find all users in Berlin who signed up last week,” you usually design keys that already point to the exact record you want (for example, user:1234:profile).

Why it’s popular for caching and sessions

Key-value stores are widely used as a cache in front of a slower primary database (like a relational database). If your app repeatedly needs the same data—product details, user permissions, pricing rules—caching the result by key avoids recomputing or re-querying.

They’re also a natural fit for session storage (e.g., session:<id> -> session data) because sessions are read and updated frequently, and they expire automatically.

TTL, eviction, and memory vs. disk

Most key-value stores support a TTL (time to live) so data can expire without manual cleanup—ideal for sessions, one-time tokens, and rate limit counters.

When memory is limited, systems often use eviction policies (like least-recently-used) to remove old entries. Some products are memory-first, while others can persist data to disk for durability. Choosing between memory and disk typically comes down to whether you’re optimizing for speed (memory) or retention/recovery (disk or persistence).

Tradeoffs to know upfront

Key-value stores shine when you already know the key. They’re less suited when your questions are open-ended.

Many have limited query patterns compared to SQL databases. Support for secondary indexes (querying by fields inside the value) varies: some provide it, some provide partial options, and others encourage you to maintain your own lookup keys.

Common use cases

Key-value stores are a great fit for:

Rate limiting: counters per user/IP with a TTL window
Feature flags: fast reads to decide behavior per user or cohort
Shopping carts: quick updates to a cart object keyed by user/session

If your access pattern is “fetch/update by ID” and latency matters, a key-value store is often the simplest way to get reliable speed.

Wide-Column Databases: Scale-Out Operational Storage

Start with SQL the easy way

Prototype your relational schema and CRUD flows with Koder.ai as you refine requirements.

Try Free

Wide-column databases (sometimes called wide-column stores) organize data into column families. Instead of thinking in terms of one fixed table with the same columns for every row, you group related columns together and can store different sets of columns per row within a family.

Wide-column vs. columnar analytics

Despite the similar names, wide-column databases are not the same as a columnar database used for analytics.

A columnar database stores each column separately to scan huge datasets efficiently (great for reporting and aggregates). A wide-column database is built for operational workloads at very large scale, where you need to write and read lots of records quickly across many machines.

Where they shine

Wide-column systems are designed for:

High write throughput (ingesting many events per second)
Horizontal scaling (adding nodes to handle more traffic and data)
Predictable, low-latency reads when you query by the right key

The typical access pattern

The most common pattern is:

You know the partition key (which decides where data lives), and
You often read a range within that partition (for example, “all events for device X between 10:00–10:05”).

This makes them a strong fit for time-ordered data and append-heavy workloads.

Tradeoffs to understand

With wide-column databases, data modeling is query-driven: you usually design tables around the exact queries you need to run. That can mean duplicating data in different shapes to support different access patterns.

They also tend to offer limited joins and fewer ad-hoc query options than a relational database. If your application relies on complex relationships and flexible querying, you may feel constrained.

Common use cases

Wide-column databases are often used for IoT events, messaging and activity streams, and other large-scale operational data where fast writes and predictable key-based reads matter more than rich relational queries.

Graph Databases: Relationships as First-Class Data

Graph databases store data the way many real systems behave: as things connected to other things. Instead of forcing relationships into tables and join tables, the connections are part of the model.

The graph model: nodes, edges, and properties

A graph typically has:

Nodes: the entities (people, accounts, devices, products)
Edges: the relationships between them ("follows", "paid", "belongs to", "shipped to")
Properties: key-value attributes on nodes and edges (timestamps, amounts, labels)

This makes it natural to represent networks, hierarchies, and many-to-many relationships without contorting your schema.

Why traversals can beat joins

Relationship-heavy queries often require many joins in a relational database. Each additional join can add complexity and cost as your data grows.

Graph databases are designed for traversals—walking from one node to connected nodes, then to their connections, and so on. When your questions routinely look like “find connected things within 2–6 steps,” traversals can stay fast and readable even as the network expands.

Questions graphs answer especially well

Graph databases shine for:

Paths and degrees of separation (shortest path, reachability)
Recommendations (“users who bought X also bought Y”, “friends of friends”)
Fraud rings and anomaly patterns (shared devices, addresses, payment methods)

Tradeoffs to plan for

Graphs can be a shift for teams: data modeling is different, and query languages (often Cypher, Gremlin, or SPARQL) may be new. You’ll also want clear conventions for relationship types and direction to keep the model maintainable.

When a relational model is still enough

If your relationships are simple, your queries are mostly filtering/aggregations, and a handful of joins covers the “connected” parts, a relational database may remain the most straightforward choice—especially when transactions and reporting are already working well.

Vector Databases: Similarity Search for AI Applications

Build and get rewarded

Share what you built with Koder.ai and get credits through the earn credits program.

Earn Credits

Vector databases are designed for a specific kind of question: “Which items are most similar to this one?” Instead of matching exact values (like an ID or a keyword), they compare embeddings—numeric representations of content (text, images, audio, products) produced by AI models. Items with similar meaning tend to have embeddings that end up close together in a multi-dimensional space.

Why vectors unlock semantic search

A normal search might miss results if the wording is different (“laptop sleeve” vs. “notebook case”). With embeddings, similarity is based on meaning, so the system can surface relevant results even when the exact words don’t match.

Core operations: similarity + filters

The main operation is nearest neighbor search: given a query vector, retrieve the closest vectors.

In real apps, you usually combine similarity with filters, such as:

Only show documents from a specific customer
Limit to a product category or language
Exclude archived or low-quality items

This “filter + similarity” pattern is how vector search becomes practical for real datasets.

Where vector databases fit

Common uses include:

RAG (Retrieval-Augmented Generation): fetch the most relevant passages before an LLM answers
Semantic search: search knowledge bases, support tickets, or internal docs
Recommendations: “users also viewed/bought” based on content similarity

Tradeoffs to know

Vector search relies on specialized indexes. Building and updating those indexes can take time, and they can use significant memory. You’ll also often choose between higher recall (finding more of the true best matches) and lower latency (faster responses).

Pairing with relational or document stores

Vector databases rarely replace your main database. A common setup is: store the “source of truth” (orders, users, documents) in a relational database or document database, and store embeddings + search indexes in a vector database—then join results back to the primary store for full records and permissions.

Time-Series Databases: Optimized for Metrics Over Time

Time-series databases (TSDBs) are designed for data that arrives continuously and is always tied to a timestamp. Think of CPU usage every 10 seconds, API latency for each request, sensor readings every minute, or stock prices changing multiple times per second.

What time-series data looks like

Most time-series records combine:

Timestamp: when the measurement happened
Metric/value: the number you’re tracking (latency, temperature, price)
Tags/labels: metadata used to filter and group (host=web-01, region=us-east, service=checkout)

This structure makes it easy to ask questions like “show error rate by service” or “compare latency across regions.”

Performance features TSDBs lean on

Because the data volume can grow quickly, TSDBs typically focus on:

Compression: storing long stretches of numeric values efficiently
Retention policies: automatically expiring old data (e.g., keep raw data 7 days, aggregates 90 days)
Downsampling: rolling up detail into summaries (per-second → per-minute → per-hour)

These features keep storage and query costs predictable without constant manual cleanup.

Common query patterns

TSDBs shine when you need time-based calculations, such as:

Rolling averages (e.g., 5-minute moving average)
Percentiles (p95/p99 latency)
Rate of change (requests/second)
Alerting on thresholds or anomalies

Where they fit (and where they don’t)

Typical use cases include monitoring, observability, IoT/sensors, and financial tick data.

The tradeoff: TSDBs are not the best choice for complex, ad-hoc relationships across many entities (for example, deeply nested joins like “users → teams → permissions → projects”). For that, a relational or graph database is usually a better fit.

Warehouses and Lakehouses: Analytics at Organization Scale

A data warehouse is less a single “type of database” and more a workload + architecture: many teams querying large historical data to answer business questions (revenue trends, churn, inventory risk). You can buy it as a managed product, but what makes it a warehouse is how it’s used—centralized, analytical, and shared.

Batch vs. streaming ingestion (the simple version)

Most warehouses accept data in two common ways:

Batch ingestion: data lands every hour/day (e.g., nightly exports from your app database). It’s cheaper and simpler, but not real-time.
Streaming ingestion: events arrive continuously (clicks, payments, IoT). You see fresher numbers, but pipelines and monitoring matter more.

Why they’re fast: columnar storage, partitioning, materialized views

Warehouses are usually optimized for analytics with a few practical tricks:

Columnar storage reads only the columns needed for a report (great for “sum, average, group by”).
Partitioning splits large tables by time or region so queries scan less data.
Materialized views save pre-computed results (like “daily sales by country”) to speed up dashboards.

Governance isn’t optional at scale

Once multiple departments rely on the same numbers, you’ll need access control (who can see what), audit trails (who queried/changed data), and lineage (where a metric came from and how it was transformed). This is often as important as query speed.

When a lakehouse makes sense

A lakehouse blends warehouse-style analytics with a data lake’s flexibility—useful when you want one place for both curated tables and raw files (logs, images, semi-structured events), without duplicating everything. It’s a good fit when data volume is high, formats vary, and you still need SQL-friendly reporting.

Key Tradeoffs: Consistency, Scale, and Query Patterns

From OLTP to reporting

Stand up analytics views and reporting endpoints while keeping Postgres as your system of record.

Create Dashboard

Choosing among database types is less about “best” and more about fit: what you need to query, how quickly, and what happens when parts of the system fail.

OLTP vs. OLAP (match the workload)

A quick rule of thumb:

OLTP (online transactions): lots of small reads/writes (checkout, logins, order updates). Priorities: low latency, correct updates, many concurrent users.
OLAP (analytics): fewer but heavier queries scanning many rows (dashboards, trends). Priorities: fast aggregation, columnar storage, separating compute from storage.

Relational databases often shine for OLTP; columnar systems, warehouses, and lakehouses are commonly used for OLAP.

CAP in plain English

When a network hiccup splits your system, you typically can’t have all three at once:

Consistency: everyone sees the same data immediately.
Availability: the system keeps responding.
Partition tolerance: it keeps working despite network splits.

Many distributed databases choose to stay available during issues and reconcile later (eventual consistency). Others prioritize strict correctness, even if that means refusing some requests until things are healthy.

Scaling: vertical, horizontal, and sharding

Vertical scaling: a bigger machine—simple, but has limits.
Horizontal scaling: more machines—more capacity, more coordination.
Sharding: splitting data across nodes (often by customer ID). It boosts scale, but cross-shard queries and transactions can become harder.

Transactions and concurrency basics

If many users update the same data, you need clear rules. Transactions bundle steps into “all-or-nothing.” Locking and isolation levels prevent conflicts, but can reduce throughput; looser isolation improves speed but may allow anomalies.

Operational concerns (don’t skip these)

Plan for backups, replication, and disaster recovery early. Also consider how easy it is to test restores, monitor lag, and perform upgrades—these day-two details often matter as much as query speed.

How to Choose the Right Database Type

Choosing between the major types of databases is less about what’s trendy and more about what you need to do with your data. A practical way to start is to work backward from your queries and workloads.

1) Start from your queries (not your data)

Write down the top 5–10 things your app or team must do:

What do you read most often (single record lookups, filters, joins, aggregations, similarity search)?
What do you write most often (single-row inserts, event streams, updates, bulk loads)?
How fresh must results be (milliseconds, seconds, minutes)?

This narrows the options faster than any feature checklist.

2) Match the database to your data shape

Use this quick “shape” checklist:

Structured, consistent fields → a relational database
Semi-structured JSON that changes often → a document database
Many-to-many relationships you traverse deeply → a graph database
Embeddings and nearest-neighbor search → a vector database
Events/metrics with timestamps and rollups → a time-series database
Huge scale-out tables with predictable access patterns → a wide-column database
Very simple get/set by key → a key-value store
Heavy analytics scans and aggregations → a columnar database (or warehouse)

3) Clarify latency, throughput, and cost drivers early

Performance targets define architecture. Set rough numbers (p95 latency, reads/writes per second, data retention). Cost usually follows:

Storage (raw data + replicas)
Compute (queries, ETL/ELT, background jobs)
Replication (multi-region, HA)
Indexing (faster queries, more write overhead)

4) A simple decision table

Primary use case	Best fit (often)	Why
Transactions, invoices, user accounts	Relational (SQL)	Strong constraints, joins, consistency
App data with evolving fields	Document	Flexible schema, natural JSON
Real-time caching/session state	Key-value store	Fast lookups by key
Clickstreams/metrics over time	Time-series	High ingest + time-based queries
BI dashboards, large aggregations	Columnar	Fast scans + compression
Social/knowledge relationships	Graph	Efficient relationship traversal
Semantic search, RAG retrieval	Vector	Similarity search over embeddings
Massive operational data at scale	Wide-column	Horizontal scaling, predictable queries

Many teams use two databases: one for operations (e.g., relational) and one for analytics (e.g., columnar/warehouse). The “right” choice is the one that makes your most important queries simplest, fastest, and cheapest to run reliably.

A practical note if you’re building products quickly

If you’re prototyping or shipping new features fast, the database decision is often coupled to your development workflow. Platforms like Koder.ai (a vibe-coding platform that generates web, backend, and mobile apps from chat) can make this more concrete: for example, Koder.ai’s default backend stack uses Go + PostgreSQL, which is a strong starting point when you need transactional correctness and broad SQL tooling.

As your product grows, you can still add specialized databases (like a vector database for semantic search or a columnar warehouse for analytics) while keeping PostgreSQL as the system of record. The key is to start with the workloads you must support today—and keep the door open for “add a second store” when the query patterns demand it.

FAQ

What does “database type” actually mean in practice?

A “database type” is shorthand for three things:

Data model (tables, documents, key-value pairs, graphs, vectors, time-stamped points)
Query patterns it’s optimized for (joins, scans/aggregations, traversals, similarity search)
Scaling and consistency tradeoffs (scale-up vs. scale-out, strict vs. eventual consistency)

Picking the type is really picking defaults for performance, cost, and operational complexity.

How do I choose the right database type without overthinking it?

Start from your top 5–10 queries and write patterns, then map them to the right strengths:

When should I use a relational (SQL) database?

Relational databases are a strong default when you need:

Structured, well-defined schemas
ACID transactions (correctness for money, inventory, orders)
Joins and constraints (foreign keys, consistent relationships)

They can become painful when you’re doing constant schema changes, or when you need extreme horizontal scale with lots of join-heavy queries spread across shards.

What are ACID transactions, and when do they matter most?

ACID is a reliability guarantee for multi-step changes:

Atomicity: all steps succeed or none do
Consistency: rules/constraints stay valid
Isolation: concurrent operations don’t corrupt each other
Durability: committed data survives crashes

It matters most for workflows where mistakes are expensive (payments, bookings, inventory updates).

Why are columnar databases faster for analytics than row-stores?

Columnar databases are best when queries:

Scan lots of rows
Read only a few columns
Compute aggregates (SUM, COUNT, AVG, )

When does a document database make more sense than SQL?

A document database is a good fit when:

Your app data maps to JSON-like objects (profiles, catalogs, content)
The shape changes frequently or varies by item
You want to store nested structures without splitting across many tables

Watch for tradeoffs around complex joins, duplicated data for read performance, and the performance cost of multi-document transactions.

What are key-value stores best used for (beyond caching)?

Use a key-value store when your access pattern is mostly:

Get/set by a single key (low-latency lookups)
Caching results from a primary database
Sessions, rate limiting, feature flags, or shopping carts

Plan around limitations: ad-hoc querying is usually weak, and secondary indexing support varies—often you design keys and additional lookup keys yourself.

What’s the difference between columnar databases and wide-column databases?

Despite the similar name, they target different workloads:

Columnar databases: analytics (fast scans + compression across columns)
Wide-column databases: large-scale operational storage (high write throughput, predictable key-based reads)

Wide-column systems typically require query-driven modeling (design tables around the exact access patterns) and don’t aim to be flexible like SQL with joins.

When should I choose a graph database over relational tables?

Use a graph database when your core questions are about relationships, like:

Paths and degrees of separation
Recommendations based on connections
Fraud rings and shared attributes across entities

Graphs excel at traversals (walking relationships) where a relational approach would need many joins. The tradeoff is adopting new modeling conventions and query languages (often Cypher/Gremlin/SPARQL).

What problem do vector databases solve, and do they replace my main database?

A vector database is designed for similarity search over embeddings (numeric representations of meaning). It’s commonly used for:

Semantic search (find relevant docs even with different wording)
RAG retrieval before an LLM answers
Similarity-based recommendations

In practice, it’s usually paired with a relational/document store: keep the source-of-truth data there, store embeddings + vector indexes in the vector DB, then join results back for full records and permissions.

GROUP BY