INDEX
Explanations
Negation
This neuron activates on words that signal a missing action or error—namely negations and contrastive terms (e.g. “but,” “never”) that highlight things not being done.
New Auto-Interp
Negative Logits
stras
-0.06
éra
-0.06
student
-0.06
Cart
-0.06
.var
-0.06
563
-0.06
noodles
-0.06
ให
-0.06
Patient
-0.06
Ala
-0.06
POSITIVE LOGITS
Bri
0.07
presumed
0.07
लगभग
0.07
gri
0.07
предполаг
0.07
/File
0.06
khu
0.06
conceivable
0.06
후기
0.06
fh
0.06
Activations Density 0.014%