INDEX
Explanations
references to political opposition parties or movements
New Auto-Interp
Negative Logits
ings
-0.07
eskort
-0.07
opp
-0.07
Freund
-0.07
warz
-0.07
igans
-0.07
ystone
-0.07
æķµ
-0.07
ey
-0.07
lius
-0.07
POSITIVE LOGITS
al
0.12
leader
0.10
aire
0.10
Leader
0.09
-minded
0.09
groups
0.08
-leaning
0.08
naire
0.08
naires
0.08
figure
0.07
Activations Density 0.005%