Foundation Models
2026
10
- JoyAI-VL-Interaction: From Chat Back to Continuous Interaction
- Bad Is Good: Why DeepSeek Did Not Use an n-Gram Structure
- Reading Model States Through Newline Tokens: A Note from Word Salad Chopper
- MineCLIP, Visual Signals, and Reward Design
- Reward and Training Loops in Real Agents: From Data Governance to Online RL
- Agentic RL: Why the Training Loop Matters More Than the Algorithm
- Reward Hacking: When Optimizers Reverse-Search the Reward Signal
- The Evolution of Reward Design: From RLHF to RLVR
- Reinforcement Learning in LLM Alignment: From Reward Signals to Advantage Estimation
- Parameter-Efficient Fine-Tuning (PEFT): From Adapter to LoRA