INDEX
Explanations
This neuron specifically activates on the conjunction “but,” highlighting contrastive or adversative “but” usages.
New Auto-Interp
Negative Logits
madness
-0.07
-Con
-0.06
gef
-0.06
Circle
-0.06
од
-0.06
Came
-0.06
fax
-0.06
pants
-0.06
DATE
-0.06
STORY
-0.06
POSITIVE LOGITS
Haj
0.07
Router
0.06
superheroes
0.06
εμπ
0.06
_bm
0.06
altered
0.06
binaries
0.06
Webb
0.06
ConstraintMaker
0.06
explorer
0.06
Activations Density 0.059%