INDEX
Explanations
references to political controversies and government actions
New Auto-Interp
Negative Logits
connexion
-0.16
oler
-0.15
meld
-0.15
901
-0.15
usty
-0.15
weighting
-0.15
renown
-0.15
regardless
-0.14
uel
-0.14
itary
-0.14
POSITIVE LOGITS
till
0.21
hog
0.21
Till
0.19
compuls
0.18
suo
0.18
derec
0.18
alg
0.17
Witness
0.15
bag
0.15
onz
0.15
Activations Density 0.125%