INDEX
Explanations
specific numerical data and references in the text
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.02
3:0.15
4:0.02
5:0.03
6:0.05
7:0.02
8:0.02
9:0.01
10:0.56
11:0.03
Negative Logits
%.
-2.83
'."
-2.67
'.
-2.61
.'"
-2.60
};
-2.58
.」
-2.57
.</
-2.47
.",
-2.40
>.
-2.40
!".
-2.31
POSITIVE LOGITS
)
3.75
)
3.38
/)
3.14
-)
3.06
)"
2.86
%)
2.74
?)
2.73
)'
2.73
!)
2.55
())
2.51
Activations Density 0.311%