INDEX
Explanations
verbs that indicate leadership or authorship
New Auto-Interp
Head Attr Weights
0:0.02
1:0.10
2:0.18
3:0.02
4:0.02
5:0.06
6:0.16
7:0.07
8:0.15
9:0.04
10:0.09
11:0.03
Negative Logits
.",
-1.15
Corsair
-1.08
gements
-1.05
yip
-1.05
chwitz
-0.99
.?
-0.99
guiActiveUnfocused
-0.97
glim
-0.94
Which
-0.94
relevant
-0.94
POSITIVE LOGITS
herself
1.26
edly
1.20
ェ
1.15
Sandra
1.13
himself
1.13
uther
1.08
Poc
1.02
Cosponsors
1.01
�
1.01
版
1.01
Activations Density 0.096%