© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

Home
GPT2-Small
Transcoders Residuals
8-TRES-DC
919

INDEX

Explanations

instances of the word "perhaps."

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

itars

-0.86

gd

-0.76

■

-0.74

••

-0.73

ILS

-0.72

duction

-0.72

XT

-0.72

Posts

-0.71

 Topics

-0.69

¶

-0.69

POSITIVE LOGITS

 predictably

0.95

 undes

0.79

 unsurprisingly

0.76

 nowhere

0.75

 overlooked

0.75

 deadliest

0.74

 easiest

0.73

 toughest

0.73

 understandable

0.72

 quickest

0.72

Activations Density 0.039%

No Known Activations