INDEX
Explanations
special characters
The main thing this neuron does is detect rare or nonstandard tokens—especially unusual symbols, foreign-language fragments, or garbled/malformed character sequences.
New Auto-Interp
Negative Logits
textbook
-0.06
certification
-0.06
::::/
-0.06
nồi
-0.06
(tweet
-0.06
丶
-0.06
atomic
-0.06
theology
-0.06
Interaction
-0.06
allergy
-0.06
POSITIVE LOGITS
��
0.08
mur
0.08
�
0.08
ha
0.07
�t
0.07
ে
0.07
�
0.07
.Exp
0.06
�
0.06
edia
0.06
Activations Density 0.144%