MIMO Claude Code Traces 1K: A Coding-Agent Trajectory Dataset
MIMO Claude Code Traces 1K is an open-source dataset of Claude Code-style coding-agent trajectories. It contains 1,017 complete traces generated with MiMo-V2.5-Pro, covering user coding tasks, multi-turn messages, tool schemas, assistant reasoning fields, tool calls, tool outputs, and metadata such as model name, category, duration, cost, token usage, and whether tools were used.
The dataset is designed for research around code agents: tool-use imitation, code-agent distillation, supervised fine-tuning, trajectory modeling, reasoning/tool-call alignment, and evaluation of software-engineering behavior.

Why MIMO Claude Code Traces Exists
Modern coding agents are not just code generators. They inspect files, run shell commands, edit code, recover from tool errors, reason over long contexts, and gradually refine a solution across many turns. Training and evaluating this behavior requires more than isolated prompt-response pairs.
MIMO Claude Code Traces narrows this problem into complete agent trajectories collected in a Claude Code-style environment. This setting is useful because software-engineering agents naturally involve:
- multi-turn task decomposition and iterative implementation;
- file reading, editing, searching, and shell execution;
- tool-call planning and recovery from failed commands;
- debugging, refactoring, API integration, and devops workflows;
- long-context reasoning over code, logs, and intermediate observations;
- token, cost, and duration metadata for efficiency-aware analysis.
By preserving full event streams instead of only final answers, the dataset supports research on how code agents actually behave while solving tasks.
Model and Generation Setup
The traces were generated with mimo-v2.5-pro, MiMo’s most capable model at the time of release. MiMo-V2.5-Pro is a 1.02T-parameter Mixture-of-Experts model with 42B active parameters, a hybrid-attention architecture, and a 1M-token context window.
It is designed for agentic workloads, complex software engineering, and long-horizon tasks, with improved instruction following and coherence across ultra-long contexts. In tool-using harnesses, it can sustain complex trajectories spanning hundreds to more than a thousand tool calls.

The dataset was produced in an agentic coding setup with tools such as Bash, Read, Write, Edit, Glob, Grep, TodoWrite, and planning utilities. The dataset construction process used approximately 400M tokens in total, while the released trace metadata records approximately 127.2M logged usage tokens across input, cache-read, and output token fields.
Dataset Overview
Each .jsonl file contains one complete Claude Code-style event stream. The release is organized by task category under session/.
Key Statistics
| Statistic | Value |
|---|---|
| Total traces | 1,017 |
| Total JSONL files | 1,017 |
| Model | mimo-v2.5-pro |
| Generation budget | ~400M tokens |
| Logged usage tokens | 127,236,485 |
| Claude Code-style event rows | 15,046 |
| Conversation messages | 11,995 |
| Assistant tool calls | 5,271 |
| Tool result messages | 5,271 |
| Traces with tool calls | 859 |
| Traces with reasoning fields | 1,017 |
| Recorded turns | 4,932 |
| Recorded duration | ~20.5 hours |
| Recorded API cost field total | $163.89 |
Logged Usage Tokens
| Token field | Count |
|---|---|
input_tokens | 8,033,778 |
cache_read_input_tokens | 117,286,784 |
cache_creation_input_tokens | 0 |
output_tokens | 1,915,923 |
| Total logged usage tokens | 127,236,485 |
Dataset Structure
The dataset is organized as a top-level README plus category folders. Each category folder contains JSONL traces for one type of coding-agent task.
mimo-claude-code-traces-1k/
├── README.md
└── session/
├── algorithms/
│ └── *.jsonl
├── api_integration/
│ └── *.jsonl
├── code_generation/
│ └── *.jsonl
├── data_processing/
│ └── *.jsonl
├── debugging/
│ └── *.jsonl
├── hf_trace/
│ └── *.jsonl
├── math_problems/
│ └── *.jsonl
├── refactoring/
│ └── *.jsonl
├── shell_devops/
│ └── *.jsonl
└── supplement/
└── *.jsonl
Category Distribution
The dataset covers algorithmic tasks, code generation, debugging, refactoring, shell/devops, Hugging Face traces, data processing, and reasoning-heavy coding prompts.
| Category | Traces | Messages | Tool calls | Traces with tools | Turns |
|---|---|---|---|---|---|
algorithms | 157 | 1,853 | 722 | 148 | 854 |
api_integration | 23 | 1,300 | 984 | 23 | 214 |
code_generation | 213 | 3,246 | 1,429 | 213 | 1,448 |
data_processing | 58 | 885 | 442 | 58 | 345 |
debugging | 162 | 941 | 252 | 96 | 380 |
hf_trace | 57 | 637 | 316 | 36 | 225 |
math_problems | 76 | 745 | 260 | 76 | 332 |
refactoring | 126 | 796 | 216 | 64 | 339 |
shell_devops | 70 | 901 | 416 | 70 | 486 |
supplement | 75 | 691 | 234 | 75 | 309 |
| Total | 1,017 | 11,995 | 5,271 | 859 | 4,932 |
Token and Cost by Category
| Category | Logged tokens | Output tokens | Duration ms | Cost USD |
|---|---|---|---|---|
algorithms | 23,302,782 | 335,330 | 12,891,693 | 29.179555 |
api_integration | 5,273,760 | 52,907 | 3,989,450 | 14.793162 |
code_generation | 41,079,654 | 771,091 | 24,428,456 | 51.162873 |
data_processing | 8,638,826 | 112,405 | 4,548,507 | 10.566932 |
debugging | 9,390,091 | 133,225 | 6,306,098 | 11.999537 |
hf_trace | 4,365,211 | 82,627 | 4,164,129 | 10.435188 |
math_problems | 8,867,295 | 128,300 | 5,003,409 | 9.130514 |
refactoring | 8,571,921 | 101,568 | 5,166,704 | 10.012703 |
shell_devops | 9,940,490 | 126,358 | 4,455,335 | 10.254762 |
supplement | 7,806,455 | 72,112 | 2,985,143 | 6.351987 |
Tool Use
MIMO Claude Code Traces captures explicit tool-call behavior, including successful calls and tool error messages. This makes it useful for learning when to call tools, how to recover from failed calls, and how to combine shell/file operations with natural-language reasoning.
| Tool | Calls |
|---|---|
Bash | 1,805 |
Read | 1,480 |
Write | 919 |
Glob | 381 |
Edit | 339 |
Grep | 163 |
Agent | 53 |
EnterPlanMode | 38 |
ExitPlanMode | 36 |
AskUserQuestion | 28 |
TodoWrite | 25 |
TaskOutput | 2 |
WebFetch | 1 |
TaskStop | 1 |
| Total | 5,271 |
Available tool schemas are included in every trace. The common Claude Code-style tool inventory includes:
Task, AskUserQuestion, Bash, CronCreate, CronDelete, CronList, Edit,
EnterPlanMode, EnterWorktree, ExitPlanMode, ExitWorktree, Glob, Grep,
NotebookEdit, Read, ScheduleWakeup, Skill, TaskOutput, TaskStop,
TodoWrite, WebFetch, WebSearch, Write
Event Stream Schema
Each .jsonl file in session/<category>/ is one Claude Code-style event stream. Each line is one event.
Common top-level fields include:
| Field | Type | Description |
|---|---|---|
type | string | Event type, such as mode, permission-mode, user, assistant, last-prompt, or system |
sessionId | string | Session identifier |
uuid | string | Event UUID. Missing original UUIDs are deterministically generated during conversion |
parentUuid | string/null | Parent event UUID for message-chain reconstruction |
timestamp | string | Event timestamp |
cwd | string | Working directory recorded or normalized for the trace |
version | string | Dataset conversion/version marker |
message | object | User, assistant, or tool-result message payload |
Assistant events store Claude-style message content blocks:
| Content block | Description |
|---|---|
text | Assistant natural-language response |
thinking | Reasoning content from the original trace |
tool_use | Tool call with id, name, and input |
Tool outputs are represented as user events whose message.content contains tool_result blocks linked by tool_use_id.
Event-Type Counts
| Event type | Count |
|---|---|
mode | 1,017 |
permission-mode | 1,017 |
user | 6,288 |
assistant | 4,690 |
last-prompt | 1,017 |
system | 1,017 |
| Total | 15,046 |
Example Event Lines
{"type":"mode","mode":"normal","sessionId":"7469deea-7e45-4732-8f06-9666d52052d4"}
{"type":"user","message":{"role":"user","content":"Implement Kahn's algorithm for topological sort..."},"sessionId":"7469deea-7e45-4732-8f06-9666d52052d4","uuid":"...","parentUuid":null}
{"type":"assistant","message":{"model":"mimo-v2.5-pro","role":"assistant","content":[{"type":"thinking","thinking":"..."},{"type":"text","text":"Let me explore the codebase."},{"type":"tool_use","id":"call_...","name":"Bash","input":{"command":"ls /data/agent/choucisan"}}]},"sessionId":"7469deea-7e45-4732-8f06-9666d52052d4","uuid":"...","parentUuid":"..."}
{"type":"user","message":{"role":"user","content":[{"type":"tool_result","tool_use_id":"call_...","content":"...","is_error":false}]},"sessionId":"7469deea-7e45-4732-8f06-9666d52052d4","uuid":"...","parentUuid":"..."}
Some fields are normalized because the original collected data did not include the full Claude Code runtime envelope. In particular, uuid, parentUuid, requestId, cwd, version, and gitBranch are deterministic conversion fields rather than raw Claude Code runtime fields.
Highlights
- Complete trajectories: Each file contains a full coding-agent event stream, not just a final answer.
- Reasoning fields: All 1,017 traces include assistant reasoning fields.
- Tool-rich behavior: 859 traces include tool calls, with 5,271 assistant tool calls in total.
- Claude Code-style format: Events follow a familiar agent trace shape with
assistant,user,tool_use, andtool_resultblocks. - Broad task coverage: Categories include algorithms, debugging, refactoring, shell/devops, API integration, data processing, and math-heavy coding prompts.
- Efficiency metadata: Token usage, duration, and recorded cost fields support cost-aware and latency-aware analysis.
- Recovery behavior: Tool errors, failed edits, missing files, and shell diagnostics are retained for studying agent robustness.
Quick Start
Load the dataset directly from Hugging Face:
from datasets import load_dataset
repo_id = "choucsan/mimo-claude-code-traces-1k"
dataset = load_dataset(repo_id, data_files="session/**/*.jsonl")
print(dataset["train"][0])
Read local JSONL files:
import json
from pathlib import Path
root = Path("mimo-claude-code-traces-1k")
files = sorted((root / "session").glob("*/*.jsonl"))
events = []
for path in files:
with open(path, "r", encoding="utf-8") as f:
for line in f:
if line.strip():
events.append(json.loads(line))
print(len(events))
print(events[0])
Count tool calls:
from collections import Counter
tool_counts = Counter()
for event in events:
message = event.get("message") or {}
for block in message.get("content", []) or []:
if isinstance(block, dict) and block.get("type") == "tool_use":
tool_counts[block.get("name")] += 1
print(tool_counts.most_common())
Applications
MIMO Claude Code Traces can be used in several research and development settings.
| Application | How MIMO Claude Code Traces Helps |
|---|---|
| Code-agent distillation | Distills mimo-v2.5-pro agent behavior into smaller code models |
| Supervised fine-tuning | Trains coding assistants on task-to-trajectory data |
| Tool-call prediction | Learns when to call shell, read, write, edit, grep, or planning tools |
| Reasoning/tool alignment | Connects assistant reasoning fields with subsequent tool-use decisions |
| Offline RL | Provides complete trajectories for tool-using code-agent policy learning |
| Reward modeling | Supports trace-level and step-level preference modeling for tool choice, edit quality, or task completion |
| Debugging research | Preserves failed commands, shell diagnostics, tool errors, and recovery attempts |
| Refactoring and cleanup | Captures multi-step codebase edits and iterative improvements |
| Shell and devops workflows | Includes command-line operations, file inspection, and execution feedback |
| Cost-aware agents | Uses token, duration, and cost metadata for efficiency-aware modeling |
| Evaluation harnesses | Benchmarks parsers, function-calling policies, and Claude Code-style trace consumers |
Data Quality Notes
- Each file in
session/<category>/is a Claude Code-style JSONL event stream for one trace. - All released traces report
metadata.model = "mimo-v2.5-pro". - All 1,017 traces include reasoning fields.
- 859 traces include at least one tool call.
- Tool outputs may include errors, failed edits, missing files, or shell diagnostics; these are retained because they are useful for training recovery behavior.
- The dataset contains generated coding-agent traces, not verified production patches. Users should validate code before using examples as ground truth.
- The event stream is converted from normalized trace records. Some Claude Code envelope fields are deterministic reconstruction fields because the original records did not include raw runtime UUIDs or local harness metadata.
- The recorded token accounting in assistant
message.usagereflects logged trace metadata and does not include all tokens used during the broader dataset construction process.
Links
Citation and Contact
If MIMO Claude Code Traces helps your work, please consider linking back to the dataset page. For questions, corrections, or collaboration, contact choucisan@gmail.com.