INDEX
Explanations
instances of surprise or realization in various contexts
New Auto-Interp
Negative Logits
uckle
-0.16
ationship
-0.15
ales
-0.15
yonel
-0.15
anian
-0.14
alia
-0.13
->{_-0.13
离
-0.13
ipe
-0.13
ASC
-0.13
POSITIVE LOGITS
sight
0.18
uger
0.15
amac
0.15
upon
0.15
habi
0.15
Britann
0.15
upon
0.14
Upon
0.14
fact
0.14
suddenly
0.14
Activations Density 0.092%