INDEX
Explanations
This neuron primarily fires on uppercase abbreviations or acronyms (short all-caps tokens).
New Auto-Interp
Negative Logits
avr
-0.08
πη
-0.07
_coef
-0.07
EFA
-0.07
plasma
-0.07
IK
-0.06
merc
-0.06
�
-0.06
OMEM
-0.06
790
-0.06
POSITIVE LOGITS
DN
0.07
>If
0.07
owning
0.07
FG
0.06
ằm
0.06
OW
0.06
JP
0.06
WS
0.06
MB
0.06
Sın
0.06
Activations Density 0.195%