INDEX
Explanations
phrases indicating preparation or anticipation
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.10
3:0.10
4:0.09
5:0.01
6:0.07
7:0.33
8:0.03
9:0.02
10:0.07
11:0.08
Negative Logits
rers
-1.76
cause
-1.71
cro
-1.66
matter
-1.61
hack
-1.60
ationally
-1.55
alties
-1.48
iencies
-1.45
link
-1.45
DOI
-1.44
POSITIVE LOGITS
rehears
1.69
farewell
1.65
paren
1.61
preparations
1.56
�
1.53
fuss
1.50
estern
1.46
prom
1.45
anticipation
1.36
Prepar
1.36
Activations Density 0.005%