INDEX
Explanations
stop words
The neuron activates on sentence‐initial tokens—especially capitalized transition words (e.g. “In,” “Next,” “This,” “That’s”) that start new sentences.
New Auto-Interp
Negative Logits
mockery
-0.07
EN
-0.07
_FINAL
-0.06
(){}↵-0.06
быть
-0.06
coupling
-0.06
________________________________________________________________
-0.06
+ ↵
-0.06
peanut
-0.06
.weixin
-0.06
POSITIVE LOGITS
клуб
0.07
既
0.06
stdafx
0.06
titular
0.06
‰
0.06
Prahy
0.06
непосред
0.06
ヾ
0.06
Dickinson
0.06
kodu
0.06
Activations Density 0.192%