Data
| Domain | Training | Evaluation: In-Distribution | Evaluation: Out-of-Distribution |
|---|---|---|---|
| Math | DAPO-Math-17k 1 | AIME 23 | Math500 4 |
| Chemistry | SciKnowEval 5 | SciKnowEval 5 | GPQA 6 |
| Tool Use | ToolAlpaca 7 | ToolAlpaca 7 | BFCLv4 8 |
| Code | Dolci-Think-RL-7B 9 | Dolci-Think-RL-7B 9 | LCBv6 10 |
| Logic | – | – | ZLogic 11 |
| Knowledge | – | – | MMLU-R 12 |
-
https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k ↩
-
https://huggingface.co/datasets/math-ai/aime24 ↩
-
https://huggingface.co/datasets/math-ai/aime25 ↩
-
https://huggingface.co/datasets/math-ai/math500 ↩
-
https://huggingface.co/datasets/Idavidrein/gpqa ↩
-
https://github.com/tangqiaoyu/ToolAlpaca/tree/main/data ↩ ↩2
-
https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/bfcl_eval/data/BFCL_v4_multiple.json ↩
-
https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B ↩ ↩2
-
https://huggingface.co/datasets/livecodebench/code_generation_lite ↩
-
https://huggingface.co/datasets/allenai/ZebraLogicBench-private ↩
-
https://huggingface.co/datasets/edinburgh-dawg/mmlu-redux-2.0 ↩