INDEX
Explanations
political figures and organizations
names of people and organizations
New Auto-Interp
Negative Logits
enment
-0.58
Debor
-0.53
denomin
-0.52
agher
-0.51
attest
-0.51
wealth
-0.51
toile
-0.48
sugg
-0.48
.$
-0.46
cms
-0.45
POSITIVE LOGITS
atism
0.54
cheat
0.49
ropri
0.48
ahu
0.47
acial
0.47
seless
0.47
shouldn
0.46
should
0.46
hematically
0.45
bably
0.45
Activations Density 1.208%