INDEX
Explanations
punctuation marks and their variations
New Auto-Interp
Negative Logits
ardin
-0.18
unal
-0.17
ahoma
-0.17
vide
-0.17
arden
-0.15
billig
-0.15
ikel
-0.15
ypad
-0.15
atten
-0.14
ÙĤÙħ
-0.14
POSITIVE LOGITS
897
0.18
iban
0.15
deaux
0.15
onio
0.15
Ban
0.14
-toggler
0.14
çħ§
0.14
pis
0.14
å®
0.14
ç«
0.14
Activations Density 0.030%