INDEX
Explanations
instances of the words "surprisingly" and "unsurprisingly" along with their related context
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.07
3:0.13
4:0.07
5:0.03
6:0.03
7:0.32
8:0.04
9:0.05
10:0.07
11:0.10
Negative Logits
ascript
-1.82
acebook
-1.67
gdala
-1.65
icrobial
-1.65
inav
-1.65
wrap
-1.63
agall
-1.59
mercial
-1.55
yip
-1.51
istries
-1.50
POSITIVE LOGITS
criticism
1.50
criticisms
1.28
Campus
1.28
occurrence
1.25
retiring
1.24
Decay
1.24
Participant
1.23
scorn
1.22
complaints
1.21
Bucc
1.21
Activations Density 0.004%