Tag: Alignment | Hyacehila's Blog

Hyacehila's Blog

HOME
ARCHIVES
ME
PROJECT
ABOUT
- FOOTPRINTS
- FRIENDS
- CV

HOME
ARCHIVES
ME
PROJECT
ABOUT

FOOTPRINTS

FRIENDS

CV
Murmur
Categories
Tags

Alignment

2026 5

Reward Hacking: When Optimizers Reverse-Search the Reward Signal
The Evolution of Reward Design: From RLHF to RLVR
Reinforcement Learning in LLM Alignment: From Reward Signals to Advantage Estimation
The Essence of LLM Reasoning and Training: From Surrogates to Reinforcement Learning Geometry
What Does the Loss Landscape of LLMs Look Like?

2025 4

Re0-05: TRL GRPOTrainer (Practice)
Re0-04: TRL GRPOTrainer (Theory)
Re0-03: HuggingFace TRL DPOTrainer
Re0-02: HuggingFace TRL SFTTrainer

1

© 2025 - 2026 Hyacehila

103 posts in total

VISITOR COUNT TOTAL PAGE VIEWS

POWERED BY Hexo THEME Redefine v2.9.0

Blog up for days hrs Min Sec

EXIF