INDEX
Explanations
proper nouns, particularly names of people
New Auto-Interp
Negative Logits
ijd
-0.16
ео
-0.15
áž
-0.15
евÑĸ
-0.15
obsolete
-0.14
pty
-0.14
å§Ķ
-0.14
oland
-0.13
.misc
-0.13
ripsi
-0.13
POSITIVE LOGITS
tos
0.15
DPR
0.14
unofficial
0.13
moh
0.13
rien
0.13
OCI
0.13
ories
0.13
ŀ
0.13
OTES
0.13
Haram
0.13
Activations Density 0.191%