back
Back to Glossary

What is JSONL

JSONL (JSON Lines) is a file format for storing structured data as one JSON object per line. Each line is valid JSON, and each line is independent from the others. That makes JSONL simple to stream, easy to append to, and practical for large datasets. You will also see it written as “.jsonl” or called “newline-delimited JSON.”

In machine learning and data annotation, JSONL is commonly used to store examples where each record represents one item to label, one labeled output, or one model prediction. Because each line is a self-contained object, tools can read and process records one at a time without loading the entire file into memory. That matters when you have millions of rows.

A typical JSONL record includes an identifier, the input payload, and fields for annotations or metadata. For text annotation, the record might contain the raw text plus spans for entities, labels, and annotator IDs. For image annotation, it might contain a URL to the image plus bounding boxes or polygons. For evaluation workflows, it might include both a model output and a human-verified expected output.

The main advantage of JSONL over a single JSON array is operational. With a JSON array, you often have to parse the entire file before you can work with the first record, and appending new records can require rewriting the file. With JSONL, you can stream line by line, validate line by line, and write incremental outputs without expensive rewrites.

JSONL does have constraints. Every record must fit on one line. If you embed long text fields, ensure they are properly escaped so newlines do not break the format. Also, JSONL files do not have a shared schema by default, so teams need conventions. If one dataset uses “label” and another uses “class,” you end up with inconsistent records. A small schema contract, even if informal, prevents that drift.

A simple example: imagine a named entity recognition dataset. Each line includes the text and an “entities” array with start and end offsets plus type. A training pipeline can read one line at a time, transform it into features, and train without holding everything in memory. If you later add a second annotation pass, you can add a “review” field per record while keeping previous records intact.

When choosing or integrating a data labeling platform, confirm that JSONL exports match the structure your pipeline expects. Export quality is not just file format. It is whether IDs are stable, whether label coordinates are consistent, whether reviewer and quality fields are preserved, and whether versioning is clear. Small mismatches create downstream friction, especially when you retrain models on a schedule.