INDEX
Explanations
words indicating movement or progression
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.09
3:0.08
4:0.11
5:0.06
6:0.06
7:0.08
8:0.08
9:0.07
10:0.07
11:0.08
Negative Logits
�
-2.67
使
-2.33
��
-2.31
�
-2.26
�
-2.22
LIN
-2.13
LM
-2.13
Kamp
-2.12
�
-2.11
Ri
-2.09
POSITIVE LOGITS
minds
2.21
shareholders
2.12
drown
2.09
shareholder
2.08
aneous
2.01
doubtless
1.99
nervous
1.96
[/
1.94
attribution
1.93
foreseeable
1.90
Activations Density 0.000%