INDEX
Explanations
terms related to political ideologies and affiliations, particularly conservative and liberal perspectives
New Auto-Interp
Head Attr Weights
0:0.22
1:0.03
2:0.07
3:0.02
4:0.07
5:0.04
6:0.17
7:0.05
8:0.04
9:0.03
10:0.14
11:0.08
Negative Logits
Gou
-1.13
Minotaur
-1.09
Ambro
-1.06
Kik
-1.04
Tsu
-1.03
simulator
-1.02
Array
-1.02
Finder
-1.01
auction
-1.01
Tunnel
-1.00
POSITIVE LOGITS
ervative
1.34
galitarian
1.29
meric
1.26
partisans
1.25
gressive
1.23
ervatives
1.21
idy
1.20
riors
1.19
creed
1.18
hemy
1.17
Activations Density 0.061%