INDEX
Explanations
terms related to political ideologies and affiliations
New Auto-Interp
Negative Logits
scape
-0.17
USR
-0.15
ffect
-0.15
ings
-0.14
यन
-0.14
verage
-0.14
Bucc
-0.14
pez
-0.14
ipers
-0.14
elon
-0.14
POSITIVE LOGITS
-leaning
0.39
leaning
0.32
lean
0.29
leaning
0.28
/left
0.25
-lite
0.22
/lib
0.21
-minded
0.21
tendencies
0.21
lean
0.20
Activations Density 0.122%