INDEX
Explanations
expressions of surprise or unexpected outcomes
New Auto-Interp
Head Attr Weights
0:0.19
1:0.02
2:0.21
3:0.10
4:0.03
5:0.06
6:0.03
7:0.06
8:0.03
9:0.03
10:0.15
11:0.04
Negative Logits
transc
-2.31
accessibility
-2.26
retri
-2.22
tag
-2.19
commer
-2.16
recorder
-2.09
conserv
-2.05
refere
-2.03
access
-2.00
accessible
-1.99
POSITIVE LOGITS
essim
3.18
icip
2.81
expecting
2.80
expectation
2.69
reassured
2.69
apprehens
2.67
underest
2.62
eyed
2.59
expectations
2.57
anticipating
2.57
Activations Density 0.011%