© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Jacobian LensNEW

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
GPT2-Small
Transcoders Residuals
8-TRES-DC
5

INDEX

Explanations

negations or expressions of lack

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 timely

-0.76

 moving

-0.70

 populated

-0.68

 phased

-0.65

 availability

-0.65

 achieving

-0.62

 parity

-0.62

 updating

-0.62

 succeed

-0.61

 immersed

-0.60

POSITIVE LOGITS

't

1.09

�士

0.84

ishes

0.81

oho

0.80

uts

0.79

itates

0.75

oit

0.74

alled

0.74

lig

0.73

lus

0.73

Activations Density 0.131%

No Known Activations