👋 About Me
I am a second-year research master’s student at Tsinghua University. My research interests span Computer Vision, AI agents, AIGC, Reinforcement Learning, Transfer Learning, Embodied AI, MLLM & LLM & VLA.
🎓 I graduated first in my college (rank 3/109) with a B.S. in Cyberspace security of computer science school from the UESTC (University of Electronic Science and Technology of China) in 2024. I am now a second-year master's student at Tsinghua University, expected to graduate in Fall 2027.
🔥 I am actively seeking PhD position starting Fall 2027 in MLLM, AIGC, and Embodied AI !
🔥 I am looking for RA/visiting student opportunities in MLLM, AIGC, and Embodied AI !
🔥 News
- 2025.12: 🎉 We have released a new paper ReflexFlow for alleviating exposure bias in Flow Matching.
- 2025.09: 🎉 We have released a new paper VERL on Reinforcement Learning for reasoning LLM.
- 2025.09: 📄 Our OTQMS for computing the optimal transfer quantities in transfer learning is accepted by NeurIPS 2025.
- 2024.09: 📄 Our unleash-then-eliminate method for alleviating the production of hallucinations while generating sentences with more details is accepted by NeurIPS 2024.
📝 Publications (* Equal Contribution, † Corresponding Authors)
Preprint


ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching
Guanbo Huang*, Jingjia Mao*, Fanding Huang*, Fengkai Liu, Xiangyang Luo, Yaoyuan Liang, Jiasheng Lu, Xiaoe Wang, Pei Liu, Ruiliu Fu†, Shao-Lun Huang†
This work proposes a learning target for Anti-Drift Rectification and a dynamic reweighting loss for Frequency Compensation to alleviate exposure bias.
Preprint 2025-12-04
Preprint


Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR
Fanding Huang*, Guanbo Huang*, Xiao Fan, Yi He, Xiao Liang, Xiao Chen, Qinting Jiang, Faisal Nadeem Khan, Jingyan Jiang†, Zhi Wang†
This work is the first to analyze the hidden state dynamics for Reinforcement Learning in LLM reasoning.
Preprint 2025-10-30
NeurIPS-2025


A High-Dimensional Statistical Method for Optimizing Transfer Quantities in Multi-Source Transfer Learning
Qingyue Zhang*, Haohao Fu*, Guanbo Huang*, Yaoyuan Liang, Chang Chu, Tianren Peng, Yanru Wu, Qi Li, Yang Li†, Shao-Lun Huang†
This work calculates the optimal training data quantity to sample from various source tasks into target task under transfer learning, based on Fisher Information theory.
NeurIPS 2025-10-29
NeurIPS-2024


Unleashing Region Understanding in Intermediate Layers for MLLM-based Referring Expression Generation
Yaoyuan Liang*, Zhuojun Cai*, Xu Jian, Guanbo Huang, Yiran Wang, Xiao Liang, Jiahao Liu, Ziran Li, Jingang Wang, Jingang Wang, Shao-Lun Huang†
This work proposes a training-free framework, named as ``unleash-then-eliminate’’, which first elicits the latent information in the intermediate layers, and then adopts a cycle-consistency-based decoding method to alleviate the production of hallucinations.
NeurIPS 2024-09-25
🎖 Honors and Awards
- Merit-based Scholarships (University and National Level). (3次校一等奖学金, 2次国家奖学金)
- Outstanding Graduates (University and Provincial Level), 2024.05. (校、省级优秀毕业生)
- National Cybersecurity Contest Second Prize, 2023.08. (全国大学生信息安全竞赛国家二等奖)
- Mathematical Modeling Contests (The First Prize of Provincial Level on CUMCM, The Honorable Mention of American MCM/ICM). (全国数模省一等奖、美国数模 Honorable Mention)
🏫 Educations
- 2024.08 - 2027.06, M.Eng. in Data Science and Information Technology @ Shenzhen International Graduate School (SIGS), Tsinghua University, Shenzhen.
- 2021.06 - 2024.06, B.Eng. in Cyber Security @ School of Computer Science and Engineering (Cyber Security), UESTC, Chengdu.
- 2020.09 - 2021.06, Engineering Experiment Class (Electronic Engineering) @ School of Electronic Science and Engineering, UESTC, Chengdu.
💻 Experiences
- 2025.06 - 2025.12, Algorithm Researcher @ Huawei 2012 Lab (Central Media Institute), Shenzhen.Task: Video Generation Foundation Model Training (Pangu-T2V, 4.5B).
- Data Enhancement: Utilized Qwen2.5 for video recaptioning; implemented pipelines to remove subtitles, watermarks, logos, and black borders to improve data quality.
- Training Strategy: Conducted mixed training with image and video data; performed Time-Shifting training and inference tests.
- Optimization: Monitored main experimental results; conducted inter-frame loss testing and reproduced RePa self-supervision methods.
- 2024.10 - 2025.03, Algorithm Engineer @ Shenzhen Zhimou Future Tech(智眸未来科技), Shenzhen.Task: Improving Accuracy & Efficiency of Action Detection.
- Algorithm Optimization: Reproduced SOTA papers utilizing multi-layer interaction between human and object information, achieving >95% action recognition accuracy.
- Efficiency: Optimized the detection pipeline, reducing latency from 200ms/frame to 50ms/frame, enabling real-time detection.
- Outcome: Patent "A Method, Device, Equipment and Storage Medium for Action Detection" (No. CN119649472A).
