INDEX
Explanations
proper nouns, particularly names of individuals
New Auto-Interp
Negative Logits
paksa
-0.62
Хьажоргаш
-0.61
ⓧ
-0.58
евич
-0.57
xito
-0.57
verket
-0.56
témoins
-0.56
eingestellt
-0.55
-0.55
Allgemeinen
-0.55
POSITIVE LOGITS
Sack
0.66
Mad
0.63
kaarangay
0.62
Mad
0.55
ofire
0.54
تضيفلها
0.54
AccessLevel
0.51
numerator
0.51
Sher
0.51
Neg
0.50
Activations Density 0.215%