Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR

Published in Preprint, 2025

这里是详情页正文(可写更长的介绍、图等)。