INDEX
Explanations
The neuron specifically lights up on arithmetic-operation words (like “times,” “multiplied,” “take away,” “minus,” “plus,” etc.).
New Auto-Interp
Negative Logits
loggedin
-0.07
粉
-0.06
Sunday
-0.06
IntelliJ
-0.06
Только
-0.06
asserts
-0.06
好
-0.06
み
-0.06
unny
-0.06
Brit
-0.06
POSITIVE LOGITS
_vel
0.06
غراف
0.06
Atmos
0.06
jal
0.06
(op
0.06
ENV
0.06
profund
0.06
POS
0.06
�
0.06
impeccable
0.06
Activations Density 0.016%