© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
GPT2-Small
Transcoders Residuals
8-TRES-DC
1144

INDEX

Explanations

expressions related to the act of lifting or raising

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

��

-0.81

aired

-0.68

vised

-0.67

hots

-0.64

anian

-0.64

vertis

-0.62

orp

-0.62

episode

-0.61

isphere

-0.61

roe

-0.59

POSITIVE LOGITS

 restrictions

0.76

 inhib

0.76

heads

0.75

IELD

0.74

 Limits

0.73

 Cert

0.73

 barriers

0.71

vale

0.69

head

0.68

 privileges

0.67

Activations Density 0.073%

No Known Activations