© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Jacobian LensNEW

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
GPT2-Small
Transcoders Residuals
8-TRES-DC
457

INDEX

Explanations

references to news media outlets

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

chell

-0.78

uve

-0.65

ouf

-0.64

 Buckley

-0.63

iod

-0.63

jri

-0.62

sen

-0.61

 Citiz

-0.61

bern

-0.60

 Stephens

-0.60

POSITIVE LOGITS

utics

0.70

atts

0.67

letters

0.62

efeated

0.62

activated

0.59

amous

0.58

utical

0.58

onential

0.57

pees

0.56

VERTISEMENT

0.56

Activations Density 0.078%

No Known Activations