INDEX
Explanations
the presence of numerical values and punctuation in text
New Auto-Interp
Negative Logits
kdir
-0.16
аÑĤив
-0.16
htt
-0.15
ereum
-0.15
Caval
-0.15
atives
-0.14
ÃĹ</
-0.14
úsqueda
-0.14
648
-0.14
665
-0.14
POSITIVE LOGITS
IDEOS
0.16
dek
0.15
.gray
0.15
ÙĦس
0.15
ham
0.14
illing
0.14
544
0.14
Madonna
0.13
onec
0.13
subs
0.13
Activations Density 0.004%