© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Jacobian LensNEW

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
GPT2-Small
Transcoders Residuals
8-TRES-DC
859

INDEX

Explanations

negative descriptors and critical assessments of behavior or situations

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

�

-0.64

�

-0.61

ogn

-0.59

upt

-0.59

�

-0.56

��

-0.56

ANG

-0.55

arsh

-0.54

�

-0.54

omorphic

-0.54

POSITIVE LOGITS

and

0.88

etc

0.86

0.86

pmwiki

0.81

...)

0.68

0.67

…)

0.63

AND

0.62

or

0.61

and

0.58

Activations Density 0.284%

No Known Activations