INDEX
Explanations
discussions about political opinions and polling
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.06
3:0.20
4:0.04
5:0.03
6:0.11
7:0.20
8:0.06
9:0.04
10:0.05
11:0.11
Negative Logits
trak
-1.45
hesda
-1.26
anguages
-1.23
natureconservancy
-1.19
ulo
-1.16
iets
-1.16
iannopoulos
-1.15
pects
-1.09
Rhythm
-1.09
+---
-1.08
POSITIVE LOGITS
cloth
1.07
pret
1.05
gered
1.04
towels
1.04
Eggs
1.02
whe
1.01
mine
1.00
��
1.00
cigarettes
0.99
laundry
0.99
Activations Density 0.001%