INDEX
Explanations
instances of surprise or unexpected events
New Auto-Interp
Negative Logits
.scalablytyped
-0.19
ajs
-0.17
une
-0.16
-ci
-0.15
isters
-0.14
stre
-0.14
ICLES
-0.14
serter
-0.14
ishops
-0.14
gie
-0.13
POSITIVE LOGITS
ingly
0.32
ably
0.23
surprise
0.20
ously
0.18
Surprise
0.18
surpr
0.18
IPA
0.16
surprised
0.16
ylon
0.16
/errors
0.15
Activations Density 0.051%