Neuronpedia
Get Started
API
Releases
Jump To
Search
Models
Circuit Tracer
NEW
Steer
SAE Evals
Exports
Slack
Blog
Privacy & Terms
Contact
Sign In
© Neuronpedia 2025
Privacy & Terms
Blog
GitHub
Slack
Twitter
Contact
Home
Releases
Finding Misaligned Persona Features in Open-Weight Models
Finding Misaligned Persona Features in Open-Weight Models
Andy Arditi
·
lesswrong.com ↗
·
huggingface.co ↗
·
huggingface.co ↗
misaligned-persona
Jump To
Jump to Source/SAE
MODEL
11-resid-post-aa
Source/SAE
Go
Jump to Feature
MODEL
11-resid-post-aa
Source/SAE
INDEX
Go
Random Feature
Random
Search Explanations
All
By Release
By Model
By Sources
Finding Misaligned Persona Features in Open-Weight Models
Andy Arditi
Show Dashboards
Hide Dashboards
Search via Inference
?
MODEL
Resid Post - 131k
All Layers
SEARCH
Run Example Search
Random
🌮 Food
📰 News
📖 Literary
👯 Personal
🧑💻 Programming
🧑🔬 Technical
🧑🏫 Academic
💼 Business
🧑⚖️ Legal
🧑🏫 Educational
🗼 Cultural
Browse
MODEL
Resid Post - 131k
LAYER
Features in
LLAMA3.1-8B-IT
@
11-resid-post-aa
Hover over a feature on the left to preview its details.
Click a feature to lock it and interact with it.