INDEX
Explanations
negations and expressions of uncertainty or lack of control
New Auto-Interp
Head Attr Weights
0:0.02
1:0.05
2:0.03
3:0.05
4:0.03
5:0.03
6:0.31
7:0.03
8:0.02
9:0.06
10:0.06
11:0.26
Negative Logits
Nelson
-2.86
Moons
-2.74
wine
-2.74
Kaine
-2.72
Stone
-2.69
Forth
-2.62
Richmond
-2.60
Strange
-2.60
,)
-2.59
imp
-2.59
POSITIVE LOGITS
�
4.52
¨
3.58
»
3.52
�
3.25
´
3.20
cffffcc
3.20
?」
3.19
」
3.18
・
3.09
ּ
3.02
Activations Density 0.292%