© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Jacobian LensNEW

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
GPT2-Small
Transcoders Residuals
8-TRES-DC
578

INDEX

Explanations

references to the second person pronoun "you."

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 adolesc

-0.67

hander

-0.65

pires

-0.60

frames

-0.59

forms

-0.57

grades

-0.56

plates

-0.56

stellar

-0.55

Marginal

-0.55

 purs

-0.55

POSITIVE LOGITS

're

1.03

asel

0.91

've

0.85

imar

0.83

 expire

0.81

are

0.79

atars

0.78

 aren

0.78

 underestimate

0.77

 disapprove

0.77

Activations Density 0.233%

No Known Activations