© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
GPT2-Small
Transcoders Residuals
8-TRES-DC
1069

INDEX

Explanations

the word "instead" and its variations, emphasizing contrasts or alternatives

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

vation

-0.81

Status

-0.68

◼

-0.66

licted

-0.66

nostic

-0.66

answer

-0.64

gerald

-0.63

mor

-0.62

mot

-0.62

/>

-0.62

POSITIVE LOGITS

forth

0.69

we

0.68

 they

0.66

 Bohem

0.66

ihara

0.65

of

0.63

of

0.63

OF

0.62

','

0.59

itiz

0.59

Activations Density 0.015%

No Known Activations