INDEX
Explanations
sentences conveying negative news or unfavorable outcomes
New Auto-Interp
Head Attr Weights
0:0.22
1:0.09
2:0.04
3:0.06
4:0.04
5:0.07
6:0.03
7:0.02
8:0.22
9:0.06
10:0.06
11:0.04
Negative Logits
govtrack
-2.16
coerc
-2.02
ga
-1.88
adobe
-1.87
xual
-1.80
RELEASE
-1.74
usc
-1.72
Amit
-1.71
ascending
-1.69
coh
-1.68
POSITIVE LOGITS
iev
1.80
Secondly
1.78
Roses
1.71
ry
1.70
essim
1.68
ruce
1.67
abor
1.67
remlin
1.64
rors
1.60
esters
1.59
Activations Density 0.000%