Dec 27, 2025·8 min

Claude Code for data import/export correctness: practical steps

Q: Should my importer reject the whole file or allow partial success?

Define three outcomes up front: - Accepted : everything imports. - Rejected : nothing imports because the file can’t be trusted (wrong headers, unreadable JSON, bad encoding). - Partially accepted : valid records import; invalid ones are skipped with clear reasons. Pick a default (many products choose partial) and make it consistent across UI and API.

Q: What should an “import contract” include?

Write down an import contract before writing validation: - Accepted formats (CSV delimiter, header required, UTF-8/BOM handling; JSON array vs object vs JSON Lines) - Field rules (required/optional/defaults) - Normalization (trim, case rules, date formats) - Duplicate definition and handling - Where validation happens (client, server, or both) This prevents “it worked yesterday” surprises when behavior changes.

Q: How do I handle weird dates and type drift ("00123" → 123, yes/no booleans)?

Default to one unambiguous format per field (for example, dates as ). If you accept variants, make it explicit and predictable (for example, accept , but not every spreadsheet guess). Avoid guessing ambiguous dates like ; either require ISO format or reject with a clear error.

Q: What’s the best way to deal with duplicates during import?

Decide: - What counts as a duplicate (email, external id, or a composite) - Detection scope (within the file, against existing records, or both) - The action (keep first, keep last, merge, or reject) If users can retry imports, combine this with idempotency so the same upload doesn’t create duplicates.

Q: How should I structure validation rules so they stay readable and testable?

Use layers instead of one giant : - Normalize input into a canonical shape - Field-level rules (type, range, enum) - Cross-field rules (start < end, totals add up) - File-level rules (required headers, duplicate keys, supported version) Small named rules are easier to test and safer to change.

Q: What should my error reporting format look like?

Return a stable error shape with: - (stable identifier) - (human-friendly) - / (column name or JSON pointer) - / (for CSV) - ( vs ) Make it actionable by including what was expected and what was received when possible.

Q: What should my import API return on success and on failure?

Always return a consistent summary, even when there are errors: - counts: , , , , plus - (success, rejected, completed with errors) - timestamps ( , ) - a for support/debugging For large files, include a small and a way to fetch the full report later.

Q: How do I make imports idempotent so retries don’t create duplicates?

Support retries explicitly: - Accept an (or use a file hash) - Return the existing if the same request is repeated - Define upsert matching rules (for example, email is the unique key) Without this, normal user retries can double-create records.

Q: How do I fuzz test CSV/JSON imports without making the test suite unmanageable?

Start with a few known-good seed files, then generate many small mutations (one change at a time): - encoding (UTF-8 BOM, invalid bytes) - structure (missing headers, extra columns, wrong delimiter) - quoting/newlines (unclosed quotes, embedded newlines) - type edges (huge numbers, empty vs null, in JSON) - size limits (very long fields, deep nesting) A fuzz test “passes” when the importer never crashes/hangs and returns deterministic, actionable errors.

Claude Code for data import/export correctness: define validation rules, consistent error formats, and fuzz tests for CSV/JSON imports to reduce edge-case support tickets.

What goes wrong with CSV and JSON imports

Imports rarely fail because the code is “wrong”. They fail because real data is messy, inconsistent, and produced by people who never saw your assumptions.

CSV problems are usually about shape and formatting. JSON problems are usually about meaning and types. Both can break in ways that look minor but create confusing results.

These issues show up again and again in support tickets:

Missing required fields (email, id, country) or columns renamed by someone “cleaning” the file
Weird dates (01/02/03, 2024-13-01, Excel serial numbers, mixed time zones)
Extra columns or unexpected nested JSON objects added by another tool
Type drift ("00123" becomes 123, true/false becomes "yes"/"no")
Duplicates and near-duplicates (same id twice, or the same person with different casing)

Correctness is not just “did it import”. You have to decide which outcomes are allowed, because users notice silent mistakes more than loud failures.

Most teams can agree on three outcomes:

Accepted: all rows or records are valid and imported
Rejected: nothing is imported because the file is not trustworthy (wrong headers, bad encoding, unreadable JSON)
Partially accepted: valid records are imported, invalid ones are skipped with clear reasons

Edge cases turn into rework when people can’t tell what went wrong or how to fix it quickly. A common scenario: a customer uploads a CSV of 5,000 rows, the importer says “Invalid format”, and they retry three times with random edits. That becomes multiple tickets plus someone on your side trying to reproduce the file locally.

Set goals that reduce the cycle: fewer retries, faster fixes, predictable results. Before you write rules, decide what “partial” means (and whether you allow it), how you’ll report row-level issues, and what users should do next (edit the file, map fields, or export a corrected version). If you’re using a vibe-coding platform like Koder.ai (koder.ai) to generate validators and tests quickly, the import contract is still what keeps that behavior consistent as the product evolves.

Decide the import contract before writing rules

Before you write a single validation rule, decide what “valid input” means for your product. Most import bugs are mismatched expectations between what users upload and what your system silently assumes.

Start with formats, and be explicit. “CSV” can mean comma or semicolon, a header row or not, UTF-8 or “whatever Excel produced.” For JSON, decide whether you accept a single object, an array of records, or JSON Lines (one JSON object per line). If you accept nested JSON, define which paths you read and which ones you ignore.

Then lock down the field contract. For every field, decide whether it’s required, optional, or optional with a default. Defaults are part of the contract, not an implementation detail. If country is missing, do you default to empty, choose a specific country, or reject the row?

Parsing behavior is where “tolerant” imports create long-term pain. Decide upfront how strict you are about trimming spaces, normalizing case, and accepting variants like "yes"/"true"/"1". Tolerance is fine if it’s predictable and documented.

Duplicates are another contract decision that affects correctness and trust. Define what counts as a duplicate (same email, same external_id, same combination of fields), where you detect it (within the file, against existing data, or both), and what you do when it happens (keep first, keep last, merge, or reject).

A contract checklist you can paste into a spec:

Accepted formats and encoding (CSV delimiter, JSON vs JSON Lines, nested support)
Field rules (required/optional/defaults, allowed values)
Normalization rules (trim, case, date/number formats)
Duplicate definition and handling (detection scope, chosen behavior)
Validation placement (client, server, or both)

Example: importing “customers.” If email is the unique key, decide whether " [email protected] " equals "[email protected]", whether missing email is allowed when external_id exists, and whether duplicates inside the file should be rejected even if the database has no match. Once this contract is fixed, consistent behavior across UI and API is much easier, whether you implement it in Koder.ai or elsewhere.

Validation rules that stay readable and testable

Messy imports often start with a single giant validate() function. A cleaner approach is layered rules with clear names and small functions. That makes changes easier to review, and tests easier to write.

Start with field-level rules: checks a single value can pass or fail on its own (type, range, length, allowed values, regex). Keep them boring and predictable. Examples: email matches a basic email pattern, age is an integer between 0 and 120, status is one of active|paused|deleted.

Add cross-field rules only where they matter. These checks depend on multiple fields, and bugs hide here. Classic examples: startDate must be before endDate, or total equals subtotal + tax - discount. Write these rules so they can point to specific fields, not just “record invalid”.

Separate record-level rules from file-level rules. A record-level rule checks one row (CSV) or one object (JSON). A file-level rule checks the whole upload: required headers exist, a unique key doesn’t repeat across rows, column count matches expectations, or the file declares a supported version.

Normalization should be explicit, not “magic”. Decide what you normalize before validating, and document it. Common examples include trimming spaces, Unicode normalization (so visually identical characters compare the same), and formatting phone numbers into one consistent storage format.

A structure that stays readable:

Normalize: transform raw input into a canonical shape.
Validate fields: small, reusable checks per field.
Validate relationships: cross-field checks with clear targets.
Validate file rules: headers, duplicates, and version support.
Test each layer: unit tests for each rule, plus a few end-to-end fixtures.

Version your rules. Put a schemaVersion (or import “profile”) in the file or API request. When you change what “valid” means, you can still re-import older exports using the older version. That one choice prevents a lot of “it used to work yesterday” tickets.

Design an error reporting format people can act on

A good importer fails in a helpful way. Vague errors lead to random retries and avoidable support work. A clear error format helps users fix the file quickly, and helps you improve validation without breaking clients.

Start with a stable error object shape and keep it consistent across CSV and JSON. You can use Claude Code to propose a schema and a few realistic examples, then lock it down as part of the import contract.

A stable error object

Treat each error as a small record with fields that don’t change. The message can evolve, but the code and location should stay stable.

code: a short, stable identifier like REQUIRED_MISSING or INVALID_DATE
message: a human-friendly sentence for the UI
path: where the problem is (JSON pointer like /customer/email, or a column name like email)
row or line: for CSV, include 1-based row number (and optionally the original line)
severity: at least error and warning

Make errors actionable. Include what you expected and what you actually saw, and when possible show an example that would pass. For instance: expected YYYY-MM-DD, got 03/12/24.

Grouping for UI and debugging

Even if you return a flat list, include enough data to group errors by row and by field. Many UIs want “Row 12 has 3 issues” and then highlight each column. Support teams like grouping because patterns become obvious (for example, every row is missing country).

A compact response might look like this:

{
  "importId": "imp_123",
  "status": "failed",
  "errors": [
    {
      "code": "INVALID_DATE",
      "message": "Signup date must be in YYYY-MM-DD.",
      "path": "signup_date",
      "row": 12,
      "severity": "error",
      "expected": "YYYY-MM-DD",
      "actual": "03/12/24"
    },
    {
      "code": "UNKNOWN_FIELD",
      "message": "Column 'fav_colour' is not recognized.",
      "path": "fav_colour",
      "row": 1,
      "severity": "warning"
    }
  ]
}

Plan for localization without changing error codes. Keep code language-neutral and durable, and treat message as replaceable text. If later you add messageKey or translated messages, old clients can still rely on the same codes for filtering, grouping, and analytics.

What the import API should return on success and failure

Bring others onto the project

Invite teammates or peers with a referral link and build your importer together in Koder.ai.

Refer Friends

To avoid “mystery imports,” your API response should answer two questions: what happened, and what should the user do next.

Return a clear import summary (every time)

Even when there are errors, return a consistent summary so the UI and support tooling can handle every import the same way.

Include:

created, updated, skipped, failed counts
totalRows (or totalRecords for JSON)
mode (for example: "createOnly", "upsert", or "updateOnly")
startedAt and finishedAt timestamps
a correlationId support can ask for

That correlationId is worth it. When someone reports “it didn’t import,” you can find the exact run and error report without guessing.

Include actionable errors, plus a way to fetch the full report

Don’t dump 10,000 row errors into the response. Return a small sample (say 20) that shows the pattern, and provide a separate way to retrieve the full report if needed.

Make each error specific and stable:

location: row number (CSV) or JSON pointer-like path (JSON)
field name
error code (machine-readable)
message (human-readable)
rejected value (be careful with sensitive data)

Example response shape (success with some row failures):

{
  "importId": "imp_01HZY...",
  "correlationId": "c_9f1f2c2a",
  "status": "completed_with_errors",
  "summary": {
    "totalRows": 1200,
    "created": 950,
    "updated": 200,
    "skipped": 10,
    "failed": 40
  },
  "errorsSample": [
    {
      "row": 17,
      "field": "email",
      "code": "invalid_format",
      "message": "Email must contain '@'.",
      "value": "maria.example.com"
    }
  ],
  "report": {
    "hasMore": true,
    "nextPageToken": "p_002"
  },
  "next": {
    "suggestedAction": "review_errors"
  }
}

Notice the next field. Even a minimal success payload should help the product move forward: show a review screen, offer a retry, or open the imported collection.

Define idempotency so repeat uploads don’t double-create

People retry. Networks fail. If the same file is imported twice, you want predictable results.

Be explicit about idempotency: accept an idempotencyKey (or compute a file hash), and return the existing importId if the request is a repeat. If your mode is upsert, define the matching rule (for example, “email is the unique key”). If it’s create-only, return “skipped” for duplicates, not “created again.”

Use the right status for failures, but keep the shape stable

If the whole request is invalid (bad auth, wrong content type, unreadable file), fail fast and return status: "rejected" with a short error list. If the file is valid but has row-level problems, treat it as a completed job with failed > 0 so users can fix and re-upload without losing the summary.

How to use Claude Code to generate rules and examples

A useful habit: make the model write the contract in a structured format, not as prose. “Helpful paragraphs” often skip details like trimming rules, default values, and whether a blank cell means “missing” or “empty”.

Use a prompt that forces a table a human can review quickly and a developer can turn into code. Ask for each field’s rule, pass and fail examples, and an explicit note for anything ambiguous (for example, empty string vs null).

You are helping design an importer for CSV and JSON.
Output a Markdown table with columns:
Field | Type | Required? | Normalization | Validation rules | Default | Pass examples | Fail examples
Rules must be testable (no vague wording).
Then output:
1) A list of edge cases to test (CSV + JSON).
2) Proposed test names with expected result (pass/fail + error code).
Finally, list any contradictions you notice (required vs default, min/max vs examples).

After the first draft, tighten it by asking for one positive and one negative example per rule. That pushes coverage of tricky corners like empty strings, whitespace-only values, missing columns, null vs "null", very large integers, scientific notation, duplicate IDs, and extra JSON fields.

For a concrete scenario, imagine importing “customers” from CSV: email is required, phone is optional, and signup_date defaults to today if missing. The model should flag a contradiction if you also say “signup_date is required”. It should propose tests like import_customers_missing_email_returns_row_error and specify the error code and message shape you return.

Do one more pass before implementation: ask the model to restate the rules as a checklist and point out where defaults, required fields, and normalization might conflict. That review step catches a lot of ticket-worthy behavior.

Step by step: fuzz tests for CSV and JSON imports

Write the import contract

Use Koder.ai planning mode to turn your import rules into a clear, testable contract.

Start Free

Fuzz testing stops “weird files” from becoming support tickets. Start from a small set of known-good CSV/JSON files, then generate thousands of slightly broken variations and make sure your importer reacts safely and clearly.

Build a seed set, then mutate it

Start with a small seed corpus of valid examples that represent real usage: the smallest valid file, a typical file, and a large file. For JSON, include one object, many objects, and nested structures if you support them.

Then add an automated mutator that tweaks one thing at a time. Keep mutations reproducible by logging the random seed so you can replay failures.

Fuzz dimensions that catch most real-world problems:

Encoding issues: UTF-8 with BOM, invalid byte sequences, mixed normalization
Structure issues: missing headers, extra columns/fields, wrong delimiter, trailing commas
Quoting and newlines: unclosed quotes, embedded newlines, CRLF vs LF, inconsistent escaping
Type edges: huge integers, NaN/Infinity (JSON), empty strings vs null, whitespace padding
Size and limits: very long fields, many rows, repeated keys, deeply nested arrays

Don’t stop at syntax. Add semantic fuzz too: swap similar fields (email vs username), extreme dates, duplicate IDs, negative quantities, or values that violate enums.

Define what “pass” means, then lock it in

Fuzz tests only help if pass criteria are strict. Your importer should never crash or hang, and errors should be consistent and actionable.

A practical set of pass rules:

No crashes, timeouts, or memory spikes beyond your limit
Clear errors with row/field pointers (CSV) or JSON paths
Stable error codes across runs for the same failure
No partial writes unless you explicitly support partial success
Successful imports produce identical results regardless of harmless formatting (like whitespace)

Run these tests in CI on every change. When you find a failure, save the exact file as a fixture and add a regression test so it never returns.

If you use Claude Code for this work, have it generate seed fixtures that match your contract, a mutation plan, and the expected error outputs. You still choose the rules, but you get a wide test surface fast, especially for CSV quoting and JSON corner cases.

Common traps that cause repeated support tickets

Most import tickets come from unclear rules and unhelpful feedback.

One common trap is “best effort” parsing that isn’t written down. If your importer silently trims spaces, accepts both commas and semicolons, or guesses date formats, users build workflows around those guesses. Then a small change, or a different file generator, breaks everything. Pick the behavior, document it, and test it.

Another repeat offender is the generic error message. “Invalid CSV” or “Bad request” forces users to guess. They upload the same file five times, and support ends up asking for the file anyway. Errors should point to a row, a field, a clear reason, and a stable code.

Failing the whole file for one bad row is also a frequent pain point. Sometimes that’s correct (for example, financial imports where partial data is dangerous). Many business imports can continue and report a summary, as long as you offer an explicit choice like strict mode vs partial import.

Text encoding issues create stubborn tickets. UTF-8 is the right default, but real CSVs often include a BOM, curly quotes, or non-breaking spaces copied from spreadsheets. Handle these consistently and report what you detected so users can fix their export settings.

Finally, changing error codes between releases breaks clients and automations. Improve wording if you want, but keep codes and meanings stable. Only version them when you truly have to.

Traps worth guarding against up front:

Undocumented “best effort” parsing that changes over time
Errors without row/field pointers and a stable error code
All-or-nothing imports with no strict vs partial option
UTF-8, BOM, and invisible characters not handled consistently
Error code changes that break client-side handling

Example: a customer exports a CSV from Excel, which adds a BOM and formats dates as 03/04/2026. Your importer guesses MM/DD, but the customer expected DD/MM. If your error report includes the detected format, the exact field, and a suggested fix, the user can correct it without back-and-forth.

Quick checklist before you ship an importer

Add fuzz testing fast

Generate seed fixtures and fuzz test mutations so weird files become repeatable tests.

Create Tests

Most import problems are small mismatches between what users think the file means and what your system accepts. Treat this as a release gate.

Headers and field names: confirm required columns are present, names match your rules, and duplicates are rejected. Decide what to do with extra columns (ignore, warn, fail) and keep it consistent.
Data types and formats: lock down how you parse integers vs decimals, booleans (true/false, 0/1, yes/no), dates, and timestamps (timezone rules). Prefer one accepted format per field.
Null and missing values: define what empty string means per field. Separate missing field, explicit null, and blank text.
Size and safety limits: set limits for file size, maximum rows, and maximum field length. Fail early with a clear message.
Deterministic errors: the same bad input should produce the same error code and message shape every time.

A practical test: use one intentionally messy file. Example: a CSV where the header appears twice (two “email” columns), a boolean field uses “Y”, and a date is “03/04/05”. Your importer shouldn’t guess. It should either apply a documented mapping rule or reject with a specific error.

Two checks teams often skip:

First, verify your importer reports errors with enough location detail to fix the source file. “Invalid date” isn’t actionable. “Row 42, column start_date: expected YYYY-MM-DD, got 03/04/05” is.

Second, run the same invalid file twice and compare results. If error order changes, codes change, or row numbers drift, users lose trust. Deterministic behavior is boring, and that’s the point.

A realistic example scenario and next steps

A common real-world import is customer orders coming from a spreadsheet export. Someone exports a CSV from an old system, edits it in Excel, then uploads it. Most tickets happen when the importer silently “fixes” data, or when the error message doesn’t say what to change.

Imagine a file named orders.csv with columns: order_id,customer_email,order_date,currency,total_amount.

Here are three realistic bad rows (as the user would see them):

order_id,customer_email,order_date,currency,total_amount
A-1001,[email protected],2026-01-05,USD,129.99
A-1002,not-an-email,01/06/2026,USD,49.00
,[email protected],2026-01-07,US, -10

Row 2 has an invalid email and an ambiguous date format. Row 3 is missing order_id, has an unsupported currency code (US instead of USD), and a negative amount.

If your API returns errors, keep the shape consistent and specific. Here’s an example response that supports partial success:

{
  "correlation_id": "imp_20260109_7f3a9d",
  "import_id": "ord_01HZZ...",
  "status": "partial_success",
  "summary": {
    "total_rows": 3,
    "imported_rows": 1,
    "failed_rows": 2
  },
  "errors": [
    {
      "row_number": 2,
      "field": "customer_email",
      "code": "invalid_email",
      "message": "Email must contain a valid domain.",
      "value": "not-an-email"
    },
    {
      "row_number": 2,
      "field": "order_date",
      "code": "invalid_date_format",
      "message": "Use ISO-8601 (YYYY-MM-DD).",
      "value": "01/06/2026"
    },
    {
      "row_number": 3,
      "field": "order_id",
      "code": "required",
      "message": "order_id is required.",
      "value": ""
    },
    {
      "row_number": 3,
      "field": "currency",
      "code": "unsupported_currency",
      "message": "Allowed values: USD, EUR, GBP.",
      "value": "US"
    },
    {
      "row_number": 3,
      "field": "total_amount",
      "code": "must_be_positive",
      "message": "total_amount must be greater than 0.",
      "value": " -10"
    }
  ],
  "retry": {
    "mode": "upload_failed_only",
    "failed_row_numbers": [2, 3]
  }
}

Partial success matters because users shouldn’t have to re-upload the entire file. A simple retry flow is: fix only the failed rows, export a small CSV containing rows 2 and 3, and re-upload. Your importer should treat this as idempotent when order_id is present, so “retry” updates the same records instead of creating duplicates.

For support, correlation_id is the fastest path to diagnosis. A support agent can ask for that single value, find the import run in logs, and confirm whether the parser saw extra columns, a wrong delimiter, or unexpected encoding.

Next steps that make this repeatable:

Use Claude Code to generate validation rules, example bad rows, and the error codes/messages you want to standardize.
Turn those examples into automated tests (including fuzz tests) so new edge cases become failing tests, not new tickets.
If you build with Koder.ai, use planning mode to draft the import contract, generate the validator and tests, then iterate until the error output stays consistent across CSV and JSON.

FAQ

Why do CSV and JSON imports fail even when my importer code looks correct?

Most failures come from messy real-world data, not “bad code.” CSV issues are usually about shape (headers, delimiter, quoting, encoding), while JSON issues are usually about meaning (types, null vs empty, unexpected nesting). Treat both as untrusted input and validate against an explicit contract.

Should my importer reject the whole file or allow partial success?

Define three outcomes up front:

Accepted: everything imports.
Rejected: nothing imports because the file can’t be trusted (wrong headers, unreadable JSON, bad encoding).
Partially accepted: valid records import; invalid ones are skipped with clear reasons.

Pick a default (many products choose partial) and make it consistent across UI and API.

What should an “import contract” include?

Write down an import contract before writing validation:

Accepted formats (CSV delimiter, header required, UTF-8/BOM handling; JSON array vs object vs JSON Lines)
Field rules (required/optional/defaults)
Normalization (trim, case rules, date formats)
Duplicate definition and handling
Where validation happens (client, server, or both)

This prevents “it worked yesterday” surprises when behavior changes.

How do I handle weird dates and type drift ("00123" → 123, yes/no booleans)?

Default to one unambiguous format per field (for example, dates as YYYY-MM-DD). If you accept variants, make it explicit and predictable (for example, accept true/false/1/0, but not every spreadsheet guess). Avoid guessing ambiguous dates like 01/02/03; either require ISO format or reject with a clear error.

What’s the best way to deal with duplicates during import?

Decide:

What counts as a duplicate (email, external_id, or a composite)
Detection scope (within the file, against existing records, or both)
The action (keep first, keep last, merge, or reject)

If users can retry imports, combine this with idempotency so the same upload doesn’t create duplicates.

How should I structure validation rules so they stay readable and testable?

Use layers instead of one giant validate():

Normalize input into a canonical shape
Field-level rules (type, range, enum)
Cross-field rules (start < end, totals add up)
File-level rules (required headers, duplicate keys, supported version)

Small named rules are easier to test and safer to change.

What should my error reporting format look like?

Return a stable error shape with:

What should my import API return on success and on failure?

Always return a consistent summary, even when there are errors:

How do I make imports idempotent so retries don’t create duplicates?

Support retries explicitly:

Accept an idempotencyKey (or use a file hash)
Return the existing importId if the same request is repeated
Define upsert matching rules (for example, email is the unique key)

Without this, normal user retries can double-create records.

How do I fuzz test CSV/JSON imports without making the test suite unmanageable?

Start with a few known-good seed files, then generate many small mutations (one change at a time):

encoding (UTF-8 BOM, invalid bytes)
structure (missing headers, extra columns, wrong delimiter)
quoting/newlines (unclosed quotes, embedded newlines)
type edges (huge numbers, empty vs null, NaN/Infinity in JSON)
size limits (very long fields, deep nesting)

A fuzz test “passes” when the importer never crashes/hangs and returns deterministic, actionable errors.