INDEX
Explanations
phrases associated with controversial or polarizing political debates and identities
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.06
3:0.33
4:0.01
5:0.02
6:0.07
7:0.11
8:0.06
9:0.11
10:0.05
11:0.09
Negative Logits
ngth
-1.45
ITNESS
-1.43
eele
-1.33
fty
-1.30
athered
-1.27
eller
-1.26
aternity
-1.23
ruary
-1.21
Creat
-1.19
ellar
-1.17
POSITIVE LOGITS
buffalo
1.18
Bengal
1.17
zombies
1.15
psycho
1.15
overdose
1.14
adesh
1.11
altogether
1.10
offence
1.09
Sharks
1.09
heroin
1.09
Activations Density 0.034%