INDEX
Explanations
punctuation marks and formatting indicators in text
New Auto-Interp
Negative Logits
presses
-0.17
bane
-0.15
villa
-0.15
åħ¸
-0.15
rut
-0.14
Yer
-0.14
pressed
-0.14
kehr
-0.14
gar
-0.14
Press
-0.14
POSITIVE LOGITS
Santos
0.15
dej
0.15
addOn
0.15
acock
0.14
QUIT
0.14
дон
0.14
prolong
0.14
lesb
0.14
ìļ´
0.14
είÏĦε
0.14
Activations Density 0.007%