INDEX
Explanations
historical events
This neuron consistently lights up on non-English (Romance-language) tokens—i.e. words or subwords from Italian/Portuguese text—indicating it’s detecting when the text switches out of English.
New Auto-Interp
Negative Logits
_STENCIL
-0.07
commitments
-0.07
dm
-0.07
سود
-0.07
Zend
-0.07
061
-0.06
promoters
-0.06
мор
-0.06
/reset
-0.06
_HELPER
-0.06
POSITIVE LOGITS
(blank
0.07
.onerror
0.06
#-}↵↵
0.06
ویش
0.06
buffer
0.06
\"\
0.06
[next
0.06
ULONG
0.06
''' ↵
0.06
("--0.06
Activations Density 0.033%