AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
pyvene.ai, The Stanford NLP Group
axbench
Jump To
Jump to Source/SAE
Jump to Feature
INDEX
Random Feature
Search Explanations
Browse
Features in GEMMA-2-9B-IT@20-axbench-reft-r1-res-16k