Claude Code for data import/export correctness: define validation rules, consistent error formats, and fuzz tests for CSV/JSON imports to reduce edge-case support tickets.

Imports rarely fail because the code is “wrong”. They fail because real data is messy, inconsistent, and produced by people who never saw your assumptions.
CSV problems are usually about shape and formatting. JSON problems are usually about meaning and types. Both can break in ways that look minor but create confusing results.
These issues show up again and again in support tickets:
Correctness is not just “did it import”. You have to decide which outcomes are allowed, because users notice silent mistakes more than loud failures.
Most teams can agree on three outcomes:
Edge cases turn into rework when people can’t tell what went wrong or how to fix it quickly. A common scenario: a customer uploads a CSV of 5,000 rows, the importer says “Invalid format”, and they retry three times with random edits. That becomes multiple tickets plus someone on your side trying to reproduce the file locally.
Set goals that reduce the cycle: fewer retries, faster fixes, predictable results. Before you write rules, decide what “partial” means (and whether you allow it), how you’ll report row-level issues, and what users should do next (edit the file, map fields, or export a corrected version). If you’re using a vibe-coding platform like Koder.ai (koder.ai) to generate validators and tests quickly, the import contract is still what keeps that behavior consistent as the product evolves.
Before you write a single validation rule, decide what “valid input” means for your product. Most import bugs are mismatched expectations between what users upload and what your system silently assumes.
Start with formats, and be explicit. “CSV” can mean comma or semicolon, a header row or not, UTF-8 or “whatever Excel produced.” For JSON, decide whether you accept a single object, an array of records, or JSON Lines (one JSON object per line). If you accept nested JSON, define which paths you read and which ones you ignore.
Then lock down the field contract. For every field, decide whether it’s required, optional, or optional with a default. Defaults are part of the contract, not an implementation detail. If country is missing, do you default to empty, choose a specific country, or reject the row?
Parsing behavior is where “tolerant” imports create long-term pain. Decide upfront how strict you are about trimming spaces, normalizing case, and accepting variants like "yes"/"true"/"1". Tolerance is fine if it’s predictable and documented.
Duplicates are another contract decision that affects correctness and trust. Define what counts as a duplicate (same email, same external_id, same combination of fields), where you detect it (within the file, against existing data, or both), and what you do when it happens (keep first, keep last, merge, or reject).
A contract checklist you can paste into a spec:
Example: importing “customers.” If email is the unique key, decide whether " [email protected] " equals "[email protected]", whether missing email is allowed when external_id exists, and whether duplicates inside the file should be rejected even if the database has no match. Once this contract is fixed, consistent behavior across UI and API is much easier, whether you implement it in Koder.ai or elsewhere.
Messy imports often start with a single giant validate() function. A cleaner approach is layered rules with clear names and small functions. That makes changes easier to review, and tests easier to write.
Start with field-level rules: checks a single value can pass or fail on its own (type, range, length, allowed values, regex). Keep them boring and predictable. Examples: email matches a basic email pattern, age is an integer between 0 and 120, status is one of active|paused|deleted.
Add cross-field rules only where they matter. These checks depend on multiple fields, and bugs hide here. Classic examples: startDate must be before endDate, or total equals subtotal + tax - discount. Write these rules so they can point to specific fields, not just “record invalid”.
Separate record-level rules from file-level rules. A record-level rule checks one row (CSV) or one object (JSON). A file-level rule checks the whole upload: required headers exist, a unique key doesn’t repeat across rows, column count matches expectations, or the file declares a supported version.
Normalization should be explicit, not “magic”. Decide what you normalize before validating, and document it. Common examples include trimming spaces, Unicode normalization (so visually identical characters compare the same), and formatting phone numbers into one consistent storage format.
A structure that stays readable:
Version your rules. Put a schemaVersion (or import “profile”) in the file or API request. When you change what “valid” means, you can still re-import older exports using the older version. That one choice prevents a lot of “it used to work yesterday” tickets.
A good importer fails in a helpful way. Vague errors lead to random retries and avoidable support work. A clear error format helps users fix the file quickly, and helps you improve validation without breaking clients.
Start with a stable error object shape and keep it consistent across CSV and JSON. You can use Claude Code to propose a schema and a few realistic examples, then lock it down as part of the import contract.
Treat each error as a small record with fields that don’t change. The message can evolve, but the code and location should stay stable.
code: a short, stable identifier like REQUIRED_MISSING or INVALID_DATEmessage: a human-friendly sentence for the UIpath: where the problem is (JSON pointer like /customer/email, or a column name like email)row or line: for CSV, include 1-based row number (and optionally the original line)severity: at least error and warningMake errors actionable. Include what you expected and what you actually saw, and when possible show an example that would pass. For instance: expected YYYY-MM-DD, got 03/12/24.
Even if you return a flat list, include enough data to group errors by row and by field. Many UIs want “Row 12 has 3 issues” and then highlight each column. Support teams like grouping because patterns become obvious (for example, every row is missing country).
A compact response might look like this:
{
"importId": "imp_123",
"status": "failed",
"errors": [
{
"code": "INVALID_DATE",
"message": "Signup date must be in YYYY-MM-DD.",
"path": "signup_date",
"row": 12,
"severity": "error",
"expected": "YYYY-MM-DD",
"actual": "03/12/24"
},
{
"code": "UNKNOWN_FIELD",
"message": "Column 'fav_colour' is not recognized.",
"path": "fav_colour",
"row": 1,
"severity": "warning"
}
]
}
Plan for localization without changing error codes. Keep code language-neutral and durable, and treat message as replaceable text. If later you add messageKey or translated messages, old clients can still rely on the same codes for filtering, grouping, and analytics.
To avoid “mystery imports,” your API response should answer two questions: what happened, and what should the user do next.
Even when there are errors, return a consistent summary so the UI and support tooling can handle every import the same way.
Include:
created, updated, skipped, failed countstotalRows (or totalRecords for JSON)mode (for example: "createOnly", "upsert", or "updateOnly")startedAt and finishedAt timestampscorrelationId support can ask forThat correlationId is worth it. When someone reports “it didn’t import,” you can find the exact run and error report without guessing.
Don’t dump 10,000 row errors into the response. Return a small sample (say 20) that shows the pattern, and provide a separate way to retrieve the full report if needed.
Make each error specific and stable:
Example response shape (success with some row failures):
{
"importId": "imp_01HZY...",
"correlationId": "c_9f1f2c2a",
"status": "completed_with_errors",
"summary": {
"totalRows": 1200,
"created": 950,
"updated": 200,
"skipped": 10,
"failed": 40
},
"errorsSample": [
{
"row": 17,
"field": "email",
"code": "invalid_format",
"message": "Email must contain '@'.",
"value": "maria.example.com"
}
],
"report": {
"hasMore": true,
"nextPageToken": "p_002"
},
"next": {
"suggestedAction": "review_errors"
}
}
Notice the next field. Even a minimal success payload should help the product move forward: show a review screen, offer a retry, or open the imported collection.
People retry. Networks fail. If the same file is imported twice, you want predictable results.
Be explicit about idempotency: accept an idempotencyKey (or compute a file hash), and return the existing importId if the request is a repeat. If your mode is upsert, define the matching rule (for example, “email is the unique key”). If it’s create-only, return “skipped” for duplicates, not “created again.”
If the whole request is invalid (bad auth, wrong content type, unreadable file), fail fast and return status: "rejected" with a short error list. If the file is valid but has row-level problems, treat it as a completed job with failed > 0 so users can fix and re-upload without losing the summary.
A useful habit: make the model write the contract in a structured format, not as prose. “Helpful paragraphs” often skip details like trimming rules, default values, and whether a blank cell means “missing” or “empty”.
Use a prompt that forces a table a human can review quickly and a developer can turn into code. Ask for each field’s rule, pass and fail examples, and an explicit note for anything ambiguous (for example, empty string vs null).
You are helping design an importer for CSV and JSON.
Output a Markdown table with columns:
Field | Type | Required? | Normalization | Validation rules | Default | Pass examples | Fail examples
Rules must be testable (no vague wording).
Then output:
1) A list of edge cases to test (CSV + JSON).
2) Proposed test names with expected result (pass/fail + error code).
Finally, list any contradictions you notice (required vs default, min/max vs examples).
After the first draft, tighten it by asking for one positive and one negative example per rule. That pushes coverage of tricky corners like empty strings, whitespace-only values, missing columns, null vs "null", very large integers, scientific notation, duplicate IDs, and extra JSON fields.
For a concrete scenario, imagine importing “customers” from CSV: email is required, phone is optional, and signup_date defaults to today if missing. The model should flag a contradiction if you also say “signup_date is required”. It should propose tests like import_customers_missing_email_returns_row_error and specify the error code and message shape you return.
Do one more pass before implementation: ask the model to restate the rules as a checklist and point out where defaults, required fields, and normalization might conflict. That review step catches a lot of ticket-worthy behavior.
Fuzz testing stops “weird files” from becoming support tickets. Start from a small set of known-good CSV/JSON files, then generate thousands of slightly broken variations and make sure your importer reacts safely and clearly.
Start with a small seed corpus of valid examples that represent real usage: the smallest valid file, a typical file, and a large file. For JSON, include one object, many objects, and nested structures if you support them.
Then add an automated mutator that tweaks one thing at a time. Keep mutations reproducible by logging the random seed so you can replay failures.
Fuzz dimensions that catch most real-world problems:
Don’t stop at syntax. Add semantic fuzz too: swap similar fields (email vs username), extreme dates, duplicate IDs, negative quantities, or values that violate enums.
Fuzz tests only help if pass criteria are strict. Your importer should never crash or hang, and errors should be consistent and actionable.
A practical set of pass rules:
Run these tests in CI on every change. When you find a failure, save the exact file as a fixture and add a regression test so it never returns.
If you use Claude Code for this work, have it generate seed fixtures that match your contract, a mutation plan, and the expected error outputs. You still choose the rules, but you get a wide test surface fast, especially for CSV quoting and JSON corner cases.
Most import tickets come from unclear rules and unhelpful feedback.
One common trap is “best effort” parsing that isn’t written down. If your importer silently trims spaces, accepts both commas and semicolons, or guesses date formats, users build workflows around those guesses. Then a small change, or a different file generator, breaks everything. Pick the behavior, document it, and test it.
Another repeat offender is the generic error message. “Invalid CSV” or “Bad request” forces users to guess. They upload the same file five times, and support ends up asking for the file anyway. Errors should point to a row, a field, a clear reason, and a stable code.
Failing the whole file for one bad row is also a frequent pain point. Sometimes that’s correct (for example, financial imports where partial data is dangerous). Many business imports can continue and report a summary, as long as you offer an explicit choice like strict mode vs partial import.
Text encoding issues create stubborn tickets. UTF-8 is the right default, but real CSVs often include a BOM, curly quotes, or non-breaking spaces copied from spreadsheets. Handle these consistently and report what you detected so users can fix their export settings.
Finally, changing error codes between releases breaks clients and automations. Improve wording if you want, but keep codes and meanings stable. Only version them when you truly have to.
Traps worth guarding against up front:
Example: a customer exports a CSV from Excel, which adds a BOM and formats dates as 03/04/2026. Your importer guesses MM/DD, but the customer expected DD/MM. If your error report includes the detected format, the exact field, and a suggested fix, the user can correct it without back-and-forth.
Most import problems are small mismatches between what users think the file means and what your system accepts. Treat this as a release gate.
A practical test: use one intentionally messy file. Example: a CSV where the header appears twice (two “email” columns), a boolean field uses “Y”, and a date is “03/04/05”. Your importer shouldn’t guess. It should either apply a documented mapping rule or reject with a specific error.
Two checks teams often skip:
First, verify your importer reports errors with enough location detail to fix the source file. “Invalid date” isn’t actionable. “Row 42, column start_date: expected YYYY-MM-DD, got 03/04/05” is.
Second, run the same invalid file twice and compare results. If error order changes, codes change, or row numbers drift, users lose trust. Deterministic behavior is boring, and that’s the point.
A common real-world import is customer orders coming from a spreadsheet export. Someone exports a CSV from an old system, edits it in Excel, then uploads it. Most tickets happen when the importer silently “fixes” data, or when the error message doesn’t say what to change.
Imagine a file named orders.csv with columns: order_id,customer_email,order_date,currency,total_amount.
Here are three realistic bad rows (as the user would see them):
order_id,customer_email,order_date,currency,total_amount
A-1001,[email protected],2026-01-05,USD,129.99
A-1002,not-an-email,01/06/2026,USD,49.00
,[email protected],2026-01-07,US, -10
Row 2 has an invalid email and an ambiguous date format. Row 3 is missing order_id, has an unsupported currency code (US instead of USD), and a negative amount.
If your API returns errors, keep the shape consistent and specific. Here’s an example response that supports partial success:
{
"correlation_id": "imp_20260109_7f3a9d",
"import_id": "ord_01HZZ...",
"status": "partial_success",
"summary": {
"total_rows": 3,
"imported_rows": 1,
"failed_rows": 2
},
"errors": [
{
"row_number": 2,
"field": "customer_email",
"code": "invalid_email",
"message": "Email must contain a valid domain.",
"value": "not-an-email"
},
{
"row_number": 2,
"field": "order_date",
"code": "invalid_date_format",
"message": "Use ISO-8601 (YYYY-MM-DD).",
"value": "01/06/2026"
},
{
"row_number": 3,
"field": "order_id",
"code": "required",
"message": "order_id is required.",
"value": ""
},
{
"row_number": 3,
"field": "currency",
"code": "unsupported_currency",
"message": "Allowed values: USD, EUR, GBP.",
"value": "US"
},
{
"row_number": 3,
"field": "total_amount",
"code": "must_be_positive",
"message": "total_amount must be greater than 0.",
"value": " -10"
}
],
"retry": {
"mode": "upload_failed_only",
"failed_row_numbers": [2, 3]
}
}
Partial success matters because users shouldn’t have to re-upload the entire file. A simple retry flow is: fix only the failed rows, export a small CSV containing rows 2 and 3, and re-upload. Your importer should treat this as idempotent when order_id is present, so “retry” updates the same records instead of creating duplicates.
For support, correlation_id is the fastest path to diagnosis. A support agent can ask for that single value, find the import run in logs, and confirm whether the parser saw extra columns, a wrong delimiter, or unexpected encoding.
Next steps that make this repeatable:
Most failures come from messy real-world data, not “bad code.” CSV issues are usually about shape (headers, delimiter, quoting, encoding), while JSON issues are usually about meaning (types, null vs empty, unexpected nesting). Treat both as untrusted input and validate against an explicit contract.
Define three outcomes up front:
Pick a default (many products choose partial) and make it consistent across UI and API.
Write down an import contract before writing validation:
This prevents “it worked yesterday” surprises when behavior changes.
Default to one unambiguous format per field (for example, dates as YYYY-MM-DD). If you accept variants, make it explicit and predictable (for example, accept true/false/1/0, but not every spreadsheet guess). Avoid guessing ambiguous dates like 01/02/03; either require ISO format or reject with a clear error.
Decide:
If users can retry imports, combine this with idempotency so the same upload doesn’t create duplicates.
Use layers instead of one giant validate():
Small named rules are easier to test and safer to change.
Return a stable error shape with:
Always return a consistent summary, even when there are errors:
Support retries explicitly:
idempotencyKey (or use a file hash)importId if the same request is repeatedWithout this, normal user retries can double-create records.
Start with a few known-good seed files, then generate many small mutations (one change at a time):
NaN/Infinity in JSON)A fuzz test “passes” when the importer never crashes/hangs and returns deterministic, actionable errors.
code (stable identifier)message (human-friendly)path/field (column name or JSON pointer)row/line (for CSV)severity (error vs warning)Make it actionable by including what was expected and what was received when possible.
created, updated, skipped, failed, plus totalRows/totalRecordsstatus (success, rejected, completed_with_errors)startedAt, finishedAt)correlationId for support/debuggingFor large files, include a small errorsSample and a way to fetch the full report later.