INDEX
Explanations
occurrences of non-Latin characters or symbols
New Auto-Interp
Negative Logits
é
-0.16
es
-0.15
arella
-0.15
Bull
-0.14
xd
-0.14
weak
-0.14
Ħĸ
-0.14
ow
-0.14
æ¼
-0.14
orsk
-0.14
POSITIVE LOGITS
ĺ
0.35
Ĵ
0.28
Ļ
0.27
ļ
0.26
Ľ
0.25
ķ
0.21
ambre
0.18
IJ
0.17
¡
0.17
ł
0.16
Activations Density 0.004%