© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
GPT2-Small
Transcoders Residuals
8-TRES-DC
63

INDEX

Explanations

phrases emphasizing consistency or similarity

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 etched

-0.66

dale

-0.63

り

-0.60

 spaced

-0.60

bard

-0.59

 breath

-0.59

omn

-0.56

com

-0.55

 quoted

-0.55

 initially

-0.55

POSITIVE LOGITS

same

0.87

 same

0.77

ourke

0.74

chwitz

0.73

 result

0.72

conn

0.70

ouses

0.70

olini

0.69

 Same

0.67

ighed

0.67

Activations Density 0.162%

No Known Activations