INDEX
Explanations
This neuron consistently activates on numeric tokens (especially decimal numbers and years).
New Auto-Interp
Negative Logits
377
-0.07
JWT
-0.06
800
-0.06
ejména
-0.06
eroon
-0.06
гот
-0.06
Null
-0.06
hashtag
-0.06
vů
-0.06
SSP
-0.06
POSITIVE LOGITS
threatens
0.07
Declarations
0.06
Spread
0.06
Music
0.06
Evil
0.06
-social
0.06
-esque
0.06
yz
0.06
Qualified
0.06
گیری
0.06
Activations Density 0.028%