© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

GPT2-SMALL · 8-TRES-DC · 218 ｜ Neuronpedia

Home
GPT2-Small
Transcoders Residuals
8-TRES-DC
218

INDEX

Explanations

instances of the word "to" indicating actions or submissions

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ワン

-0.96

覚醒

-0.86

 freely

-0.77

グ

-0.75

 cheaply

-0.74

leeve

-0.71

bodied

-0.69

accessible

-0.68

 furiously

-0.67

-0.67

POSITIVE LOGITS

 Michele

0.75

 Polit

0.73

us

0.71

me

0.71

 POLITICO

0.70

 Manny

0.70

 Danielle

0.69

 Ralph

0.68

 Northwestern

0.68

 Herb

0.68

Activations Density 0.211%

No Known Activations