INDEX
Explanations
specific special characters or diacritics in text
New Auto-Interp
Negative Logits
bane
-0.17
ago
-0.16
دÙĩÙħ
-0.15
anges
-0.15
ré
-0.14
enames
-0.14
ugi
-0.14
inspect
-0.14
eness
-0.14
atur
-0.14
POSITIVE LOGITS
cher
0.16
اÙĨÙĩ
0.15
Pert
0.15
ffen
0.15
ieux
0.15
ött
0.14
glm
0.14
edy
0.14
lichen
0.14
ÄŁÃ¼
0.14
Activations Density 0.024%