INDEX
Explanations
This neuron activates on negation of effects—phrases expressing that something “did not” occur or had “no” effect.
New Auto-Interp
Negative Logits
소
-0.06
undles
-0.06
Dalton
-0.06
uforia
-0.06
abyss
-0.06
alumnos
-0.06
えて
-0.06
ثمان
-0.06
dbe
-0.06
Footer
-0.06
POSITIVE LOGITS
ㅋㅋ
0.07
ponge
0.06
Turkish
0.06
./
0.06
typename
0.06
.Decimal
0.06
popul
0.06
�
0.06
breadcrumbs
0.06
раниц
0.06
Activations Density 0.026%