INDEX
Explanations
instances of the word "surprise" in various forms and contexts
New Auto-Interp
Negative Logits
oble
-0.18
ixels
-0.17
edly
-0.17
oldt
-0.16
oke
-0.16
bare
-0.16
esium
-0.15
ertia
-0.15
mouth
-0.15
ed
-0.15
POSITIVE LOGITS
prisingly
0.27
-sur
0.21
rounded
0.21
prising
0.20
charge
0.20
veillance
0.19
veys
0.19
prises
0.19
jective
0.19
rogate
0.19
Activations Density 0.019%