INDEX
Head Attr Weights
0:0.06
1:0.05
2:0.04
3:0.08
4:0.15
5:0.06
6:0.04
7:0.07
8:0.03
9:0.03
10:0.10
11:0.24
Negative Logits
flush
-2.46
circled
-2.35
eleph
-2.35
flared
-2.28
tremend
-2.09
`.
-2.09
oun
-2.08
``
-2.08
flushed
-2.06
hur
-2.04
POSITIVE LOGITS
?,
3.15
/,
2.95
chens
2.77
‐
2.63
®,
2.41
@
2.31
,[
2.27
.,
2.27
%,
2.27
","
2.25
Activations Density 0.009%