INDEX
Explanations
Words ending in certain suffixes
The neuron fires on Cyrillic‐script words, effectively detecting Russian‐language text.
New Auto-Interp
Negative Logits
i
-0.09
ke
-0.09
E
-0.09
ane
-0.09
CE
-0.08
me
-0.08
э
-0.08
e
-0.08
FE
-0.08
I
-0.08
POSITIVE LOGITS
on
0.16
ON
0.14
or
0.12
son
0.11
don
0.11
don
0.11
DON
0.11
ton
0.11
SON
0.10
elon
0.10
Activations Density 0.305%