INDEX
Explanations
topics related to political ideologies and dynamics
New Auto-Interp
Negative Logits
uxxxx
-0.61
cérémonie
-0.57
adə
-0.57
addOn
-0.56
pilotes
-0.54
cittadini
-0.54
acús
-0.53
négociations
-0.53
hâte
-0.53
íqu
-0.53
POSITIVE LOGITS
anti
1.25
liberal
1.18
radical
1.12
conservative
1.10
liberal
0.98
liberalism
0.98
conservative
0.98
neo
0.96
extremist
0.95
libertarian
0.95
Activations Density 0.660%