Model Mechanics
2026
10
- Bad Is Good: Why DeepSeek Did Not Use an n-Gram Structure
- Reading Model States Through Newline Tokens: A Note from Word Salad Chopper Why Output Tokens Are More Expensive: From KV Cache to Agent Cost Engineering
- What Does Model Routing Actually Solve? From Agent Cost and Latency to Reasoning Control
- Why Better Simulators Often Combine Learning and Rules: From PDEs and Ray Tracing to DLSS
- Parameter-Efficient Fine-Tuning (PEFT): From Adapter to LoRA
- Why Language Models Hallucinate
- What Does the Loss Landscape of LLMs Look Like?
- Compression for AGI: Compression as Intelligence
- Neural Scaling Laws: From Kaplan to Chinchilla