INDEX
Explanations
phrases related to political actions or positions
New Auto-Interp
Head Attr Weights
0:0.04
1:0.04
2:0.12
3:0.08
4:0.03
5:0.15
6:0.20
7:0.04
8:0.07
9:0.09
10:0.06
11:0.03
Negative Logits
Pix
-1.35
})
-1.32
($)
-1.26
-->
-1.26
...)
-1.23
});
-1.19
Osiris
-1.15
Brawl
-1.15
CLR
-1.15
Examiner
-1.14
POSITIVE LOGITS
"))
1.78
ONSORED
1.55
ajo
1.40
usalem
1.40
jew
1.39
milo
1.39
vae
1.39
igious
1.34
uez
1.30
daq
1.30
Activations Density 0.002%