INDEX
Explanations
characters from non-Latin alphabets along with some English words and phrases, possibly related to historical or cultural contexts
special characters or symbols, particularly from non-Latin scripts
New Auto-Interp
Negative Logits
nels
-0.77
dope
-0.77
agre
-0.71
sterdam
-0.70
Blitz
-0.69
nyder
-0.68
nell
-0.66
eln
-0.66
ftime
-0.66
Miss
-0.64
POSITIVE LOGITS
ŃĶ
1.18
ľ
1.05
ł
1.05
ĵ
1.04
ĨĴ
1.02
«ĺ
1.02
Ļ
1.02
Ĵ
1.00
ħ
1.00
å
0.99
Activations Density 0.011%