INDEX
Explanations
instances of punctuation and specific pronouns
New Auto-Interp
Negative Logits
lom
-0.18
оÑı
-0.15
ibal
-0.14
Ñıл
-0.14
ido
-0.14
clid
-0.14
'L
-0.14
ounsel
-0.13
елÑı
-0.13
trous
-0.13
POSITIVE LOGITS
_launcher
0.15
ohl
0.15
Styles
0.15
antu
0.14
enu
0.14
leur
0.13
axy
0.13
Congress
0.13
ospace
0.13
utr
0.13
Activations Density 0.004%