INDEX
Explanations
terms related to political ideologies, specifically distinguishing between liberal and conservative viewpoints
New Auto-Interp
Negative Logits
lasses
-0.15
kin
-0.14
hsi
-0.14
tweet
-0.14
ollar
-0.14
itra
-0.14
hal
-0.14
uci
-0.14
jÃŃm
-0.13
ç´¯
-0.13
POSITIVE LOGITS
unker
0.17
-Christian
0.15
ograd
0.15
-leaning
0.15
/lib
0.15
/social
0.15
credentials
0.14
zac
0.14
/mod
0.14
veget
0.14
Activations Density 0.029%