Neuronpedia

APISteer SAE Evals Blog/PodcastNEW Slack Privacy & Terms Contact

© Neuronpedia 2025

Privacy & Terms Blog/Podcast GitHub Slack Twitter Contact

Home
Dunefsky · Chlenski · Transcoders Enable Fine-Grained Interpretable Circuit Analysis
GPT2-Small
Transcoders Residuals
1-TRES-DC
4066

INDEX

Explanations

the word "wasn't" and its variations

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Pigs

-0.78

 Powered

-0.77

birds

-0.71

ONS

-0.71

 Dise

-0.67

dragon

-0.66

 coefficients

-0.66

 dates

-0.65

pell

-0.65

planes

-0.65

POSITIVE LOGITS

amacare

0.90

uala

0.87

ten

0.83

�

0.82

nel

0.73

gan

0.72

tyard

0.70

rane

0.70

ts

0.69

eday

0.68

Activations Density 0.018%

No Known Activations