Data | Denser ≠ Better

Domain	Training	Evaluation: In-Distribution	Evaluation: Out-of-Distribution
Math	DAPO-Math-17k ¹	AIME ²³	Math500 ⁴
Chemistry	SciKnowEval ⁵	SciKnowEval ⁵	GPQA ⁶
Tool Use	ToolAlpaca ⁷	ToolAlpaca ⁷	BFCLv4 ⁸
Code	Dolci-Think-RL-7B ⁹	Dolci-Think-RL-7B ⁹	LCBv6 ¹⁰
Logic	–	–	ZLogic ¹¹
Knowledge	–	–	MMLU-R ¹²

https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k ↩
https://huggingface.co/datasets/math-ai/aime24 ↩
https://huggingface.co/datasets/math-ai/aime25 ↩
https://huggingface.co/datasets/math-ai/math500 ↩
https://huggingface.co/datasets/hicai-zju/SciKnowEval ↩ ↩²
https://huggingface.co/datasets/Idavidrein/gpqa ↩
https://github.com/tangqiaoyu/ToolAlpaca/tree/main/data ↩ ↩²
https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/bfcl_eval/data/BFCL_v4_multiple.json ↩
https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B ↩ ↩²
https://huggingface.co/datasets/livecodebench/code_generation_lite ↩
https://huggingface.co/datasets/allenai/ZebraLogicBench-private ↩
https://huggingface.co/datasets/edinburgh-dawg/mmlu-redux-2.0 ↩