INDEX
Explanations
words related to surprise or shock
expressions of surprise or shock
New Auto-Interp
Negative Logits
Runner
-0.85
aunder
-0.72
iatrics
-0.72
teasp
-0.69
trap
-0.69
eva
-0.68
ractor
-0.68
ettle
-0.68
alde
-0.68
ourage
-0.67
POSITIVE LOGITS
products
1.04
nature
0.79
what
0.75
how
0.73
virtue
0.71
product
0.66
Sapp
0.64
sheer
0.64
whatever
0.64
these
0.64
Activations Density 0.060%