INDEX
Explanations
negation or denial phrases
New Auto-Interp
Head Attr Weights
0:0.09
1:0.07
2:0.08
3:0.08
4:0.08
5:0.08
6:0.07
7:0.08
8:0.08
9:0.08
10:0.07
11:0.08
Negative Logits
inav
-1.85
drawn
-1.66
sidx
-1.64
�
-1.60
icol
-1.57
jiang
-1.56
Franch
-1.56
enne
-1.53
Regions
-1.51
favorite
-1.50
POSITIVE LOGITS
1.70
Burnett
1.45
unequivocally
1.42
churn
1.38
%"
1.37
abortion
1.34
independently
1.33
creep
1.31
"—
1.31
.,"
1.30
Activations Density 0.000%