INDEX
Explanations
words and phrases related to surprise or unexpectedness
New Auto-Interp
Negative Logits
mav
-0.15
.scalablytyped
-0.15
मर
-0.15
istrovstvÃŃ
-0.15
une
-0.15
casts
-0.14
_ary
-0.14
ajs
-0.14
nd
-0.14
каÑģ
-0.13
POSITIVE LOGITS
ingly
0.35
surprise
0.25
surpr
0.22
Surprise
0.21
surprised
0.20
surprises
0.19
ively
0.19
ably
0.18
unexpected
0.18
oeff
0.17
Activations Density 0.040%