INDEX
Explanations
personal pronouns and phrases indicating personal responsibility or perspective
New Auto-Interp
Head Attr Weights
0:0.04
1:0.03
2:0.07
3:0.29
4:0.07
5:0.03
6:0.10
7:0.06
8:0.04
9:0.05
10:0.09
11:0.08
Negative Logits
utterstock
-2.01
agric
-1.58
�
-1.54
Hels
-1.49
respectively
-1.48
duo
-1.45
neau
-1.43
berus
-1.39
版
-1.37
swick
-1.36
POSITIVE LOGITS
..."
3.29
…"
3.08
.")
3.05
%"
2.84
..."
2.75
…"
2.71
!",
2.69
"'
2.68
!"
2.66
");
2.65
Activations Density 0.065%