INDEX
Explanations
proper nouns, especially names of individuals
New Auto-Interp
Negative Logits
abant
-0.17
kho
-0.16
inium
-0.15
ceph
-0.15
loo
-0.15
زاÙĨ
-0.15
.ls
-0.14
ilis
-0.14
ÙĪØ§ÙĨ
-0.14
Scores
-0.14
POSITIVE LOGITS
alfa
0.20
åĦĢ
0.18
/stretch
0.15
Güven
0.15
ADO
0.14
å±Ĩ
0.14
ãĥ«
0.14
ONO
0.14
dsa
0.14
summ
0.13
Activations Density 0.001%