MIMO Claude Code Traces 1K: A Coding-Agent Trajectory Dataset

MIMO Claude Code Traces 1K is an open-source dataset of Claude Code-style coding-agent trajectories. It contains 1,017 complete traces generated with MiMo-V2.5-Pro, covering user coding tasks, multi-turn messages, tool schemas, assistant reasoning fields, tool calls, tool outputs, and metadata such as model name, category, duration, cost, token usage, and whether tools were used.

The dataset is designed for research around code agents: tool-use imitation, code-agent distillation, supervised fine-tuning, trajectory modeling, reasoning/tool-call alignment, and evaluation of software-engineering behavior.

MiMo-V2.5-Pro benchmark results

Why MIMO Claude Code Traces Exists

Modern coding agents are not just code generators. They inspect files, run shell commands, edit code, recover from tool errors, reason over long contexts, and gradually refine a solution across many turns. Training and evaluating this behavior requires more than isolated prompt-response pairs.

MIMO Claude Code Traces narrows this problem into complete agent trajectories collected in a Claude Code-style environment. This setting is useful because software-engineering agents naturally involve:

multi-turn task decomposition and iterative implementation;
file reading, editing, searching, and shell execution;
tool-call planning and recovery from failed commands;
debugging, refactoring, API integration, and devops workflows;
long-context reasoning over code, logs, and intermediate observations;
token, cost, and duration metadata for efficiency-aware analysis.

By preserving full event streams instead of only final answers, the dataset supports research on how code agents actually behave while solving tasks.

Model and Generation Setup

The traces were generated with mimo-v2.5-pro, MiMo’s most capable model at the time of release. MiMo-V2.5-Pro is a 1.02T-parameter Mixture-of-Experts model with 42B active parameters, a hybrid-attention architecture, and a 1M-token context window.

It is designed for agentic workloads, complex software engineering, and long-horizon tasks, with improved instruction following and coherence across ultra-long contexts. In tool-using harnesses, it can sustain complex trajectories spanning hundreds to more than a thousand tool calls.

MiMo-V2.5-Pro token efficiency

The dataset was produced in an agentic coding setup with tools such as Bash, Read, Write, Edit, Glob, Grep, TodoWrite, and planning utilities. The dataset construction process used approximately 400M tokens in total, while the released trace metadata records approximately 127.2M logged usage tokens across input, cache-read, and output token fields.

Dataset Overview

Each .jsonl file contains one complete Claude Code-style event stream. The release is organized by task category under session/.

Key Statistics

Statistic	Value
Total traces	1,017
Total JSONL files	1,017
Model	`mimo-v2.5-pro`
Generation budget	~400M tokens
Logged usage tokens	127,236,485
Claude Code-style event rows	15,046
Conversation messages	11,995
Assistant tool calls	5,271
Tool result messages	5,271
Traces with tool calls	859
Traces with reasoning fields	1,017
Recorded turns	4,932
Recorded duration	~20.5 hours
Recorded API cost field total	$163.89

Logged Usage Tokens

Token field	Count
`input_tokens`	8,033,778
`cache_read_input_tokens`	117,286,784
`cache_creation_input_tokens`	0
`output_tokens`	1,915,923
Total logged usage tokens	127,236,485

Dataset Structure

The dataset is organized as a top-level README plus category folders. Each category folder contains JSONL traces for one type of coding-agent task.

mimo-claude-code-traces-1k/
├── README.md
└── session/
    ├── algorithms/
    │   └── *.jsonl
    ├── api_integration/
    │   └── *.jsonl
    ├── code_generation/
    │   └── *.jsonl
    ├── data_processing/
    │   └── *.jsonl
    ├── debugging/
    │   └── *.jsonl
    ├── hf_trace/
    │   └── *.jsonl
    ├── math_problems/
    │   └── *.jsonl
    ├── refactoring/
    │   └── *.jsonl
    ├── shell_devops/
    │   └── *.jsonl
    └── supplement/
        └── *.jsonl

Category Distribution

The dataset covers algorithmic tasks, code generation, debugging, refactoring, shell/devops, Hugging Face traces, data processing, and reasoning-heavy coding prompts.

Category	Traces	Messages	Tool calls	Traces with tools	Turns
`algorithms`	157	1,853	722	148	854
`api_integration`	23	1,300	984	23	214
`code_generation`	213	3,246	1,429	213	1,448
`data_processing`	58	885	442	58	345
`debugging`	162	941	252	96	380
`hf_trace`	57	637	316	36	225
`math_problems`	76	745	260	76	332
`refactoring`	126	796	216	64	339
`shell_devops`	70	901	416	70	486
`supplement`	75	691	234	75	309
Total	1,017	11,995	5,271	859	4,932

Token and Cost by Category

Category	Logged tokens	Output tokens	Duration ms	Cost USD
`algorithms`	23,302,782	335,330	12,891,693	29.179555
`api_integration`	5,273,760	52,907	3,989,450	14.793162
`code_generation`	41,079,654	771,091	24,428,456	51.162873
`data_processing`	8,638,826	112,405	4,548,507	10.566932
`debugging`	9,390,091	133,225	6,306,098	11.999537
`hf_trace`	4,365,211	82,627	4,164,129	10.435188
`math_problems`	8,867,295	128,300	5,003,409	9.130514
`refactoring`	8,571,921	101,568	5,166,704	10.012703
`shell_devops`	9,940,490	126,358	4,455,335	10.254762
`supplement`	7,806,455	72,112	2,985,143	6.351987

Tool Use

MIMO Claude Code Traces captures explicit tool-call behavior, including successful calls and tool error messages. This makes it useful for learning when to call tools, how to recover from failed calls, and how to combine shell/file operations with natural-language reasoning.

Tool	Calls
`Bash`	1,805
`Read`	1,480
`Write`	919
`Glob`	381
`Edit`	339
`Grep`	163
`Agent`	53
`EnterPlanMode`	38
`ExitPlanMode`	36
`AskUserQuestion`	28
`TodoWrite`	25
`TaskOutput`	2
`WebFetch`	1
`TaskStop`	1
Total	5,271

Available tool schemas are included in every trace. The common Claude Code-style tool inventory includes:

Task, AskUserQuestion, Bash, CronCreate, CronDelete, CronList, Edit,
EnterPlanMode, EnterWorktree, ExitPlanMode, ExitWorktree, Glob, Grep,
NotebookEdit, Read, ScheduleWakeup, Skill, TaskOutput, TaskStop,
TodoWrite, WebFetch, WebSearch, Write

Event Stream Schema

Each .jsonl file in session/<category>/ is one Claude Code-style event stream. Each line is one event.

Common top-level fields include:

Field	Type	Description
`type`	string	Event type, such as `mode`, `permission-mode`, `user`, `assistant`, `last-prompt`, or `system`
`sessionId`	string	Session identifier
`uuid`	string	Event UUID. Missing original UUIDs are deterministically generated during conversion
`parentUuid`	string/null	Parent event UUID for message-chain reconstruction
`timestamp`	string	Event timestamp
`cwd`	string	Working directory recorded or normalized for the trace
`version`	string	Dataset conversion/version marker
`message`	object	User, assistant, or tool-result message payload

Assistant events store Claude-style message content blocks:

Content block	Description
`text`	Assistant natural-language response
`thinking`	Reasoning content from the original trace
`tool_use`	Tool call with `id`, `name`, and `input`

Tool outputs are represented as user events whose message.content contains tool_result blocks linked by tool_use_id.

Event-Type Counts

Event type	Count
`mode`	1,017
`permission-mode`	1,017
`user`	6,288
`assistant`	4,690
`last-prompt`	1,017
`system`	1,017
Total	15,046

Example Event Lines

{"type":"mode","mode":"normal","sessionId":"7469deea-7e45-4732-8f06-9666d52052d4"}
{"type":"user","message":{"role":"user","content":"Implement Kahn's algorithm for topological sort..."},"sessionId":"7469deea-7e45-4732-8f06-9666d52052d4","uuid":"...","parentUuid":null}
{"type":"assistant","message":{"model":"mimo-v2.5-pro","role":"assistant","content":[{"type":"thinking","thinking":"..."},{"type":"text","text":"Let me explore the codebase."},{"type":"tool_use","id":"call_...","name":"Bash","input":{"command":"ls /data/agent/choucisan"}}]},"sessionId":"7469deea-7e45-4732-8f06-9666d52052d4","uuid":"...","parentUuid":"..."}
{"type":"user","message":{"role":"user","content":[{"type":"tool_result","tool_use_id":"call_...","content":"...","is_error":false}]},"sessionId":"7469deea-7e45-4732-8f06-9666d52052d4","uuid":"...","parentUuid":"..."}

Some fields are normalized because the original collected data did not include the full Claude Code runtime envelope. In particular, uuid, parentUuid, requestId, cwd, version, and gitBranch are deterministic conversion fields rather than raw Claude Code runtime fields.

Highlights

Complete trajectories: Each file contains a full coding-agent event stream, not just a final answer.
Reasoning fields: All 1,017 traces include assistant reasoning fields.
Tool-rich behavior: 859 traces include tool calls, with 5,271 assistant tool calls in total.
Claude Code-style format: Events follow a familiar agent trace shape with assistant, user, tool_use, and tool_result blocks.
Broad task coverage: Categories include algorithms, debugging, refactoring, shell/devops, API integration, data processing, and math-heavy coding prompts.
Efficiency metadata: Token usage, duration, and recorded cost fields support cost-aware and latency-aware analysis.
Recovery behavior: Tool errors, failed edits, missing files, and shell diagnostics are retained for studying agent robustness.

Quick Start

Load the dataset directly from Hugging Face:

from datasets import load_dataset

repo_id = "choucsan/mimo-claude-code-traces-1k"
dataset = load_dataset(repo_id, data_files="session/**/*.jsonl")

print(dataset["train"][0])

Read local JSONL files:

import json
from pathlib import Path

root = Path("mimo-claude-code-traces-1k")
files = sorted((root / "session").glob("*/*.jsonl"))

events = []
for path in files:
    with open(path, "r", encoding="utf-8") as f:
        for line in f:
            if line.strip():
                events.append(json.loads(line))

print(len(events))
print(events[0])

Count tool calls:

from collections import Counter

tool_counts = Counter()

for event in events:
    message = event.get("message") or {}
    for block in message.get("content", []) or []:
        if isinstance(block, dict) and block.get("type") == "tool_use":
            tool_counts[block.get("name")] += 1

print(tool_counts.most_common())

Applications

MIMO Claude Code Traces can be used in several research and development settings.

Application	How MIMO Claude Code Traces Helps
Code-agent distillation	Distills `mimo-v2.5-pro` agent behavior into smaller code models
Supervised fine-tuning	Trains coding assistants on task-to-trajectory data
Tool-call prediction	Learns when to call shell, read, write, edit, grep, or planning tools
Reasoning/tool alignment	Connects assistant reasoning fields with subsequent tool-use decisions
Offline RL	Provides complete trajectories for tool-using code-agent policy learning
Reward modeling	Supports trace-level and step-level preference modeling for tool choice, edit quality, or task completion
Debugging research	Preserves failed commands, shell diagnostics, tool errors, and recovery attempts
Refactoring and cleanup	Captures multi-step codebase edits and iterative improvements
Shell and devops workflows	Includes command-line operations, file inspection, and execution feedback
Cost-aware agents	Uses token, duration, and cost metadata for efficiency-aware modeling
Evaluation harnesses	Benchmarks parsers, function-calling policies, and Claude Code-style trace consumers

Data Quality Notes

Each file in session/<category>/ is a Claude Code-style JSONL event stream for one trace.
All released traces report metadata.model = "mimo-v2.5-pro".
All 1,017 traces include reasoning fields.
859 traces include at least one tool call.
Tool outputs may include errors, failed edits, missing files, or shell diagnostics; these are retained because they are useful for training recovery behavior.
The dataset contains generated coding-agent traces, not verified production patches. Users should validate code before using examples as ground truth.
The event stream is converted from normalized trace records. Some Claude Code envelope fields are deterministic reconstruction fields because the original records did not include raw runtime UUIDs or local harness metadata.
The recorded token accounting in assistant message.usage reflects logged trace metadata and does not include all tokens used during the broader dataset construction process.

Citation and Contact

If MIMO Claude Code Traces helps your work, please consider linking back to the dataset page. For questions, corrections, or collaboration, contact choucisan@gmail.com.

MIMO Claude Code Traces 1K