INDEX
Explanations
phrases related to public and media discourse on controversial topics
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.06
3:0.05
4:0.06
5:0.02
6:0.48
7:0.04
8:0.04
9:0.03
10:0.06
11:0.04
Negative Logits
ortium
-1.76
lished
-1.73
ificantly
-1.61
anwhile
-1.59
GOODMAN
-1.57
assium
-1.56
etheus
-1.56
EStream
-1.51
isSpecialOrderable
-1.45
theless
-1.44
POSITIVE LOGITS
sounding
1.45
folk
1.36
glers
1.36
smanship
1.35
national
1.28
bur
1.28
ocratic
1.28
aggressive
1.24
isms
1.24
ilian
1.24
Activations Density 0.861%