INDEX
Explanations
sentences expressing surprise or astonishment
phrases indicating surprise or astonishment
New Auto-Interp
Negative Logits
alach
-0.78
enture
-0.75
reau
-0.75
condu
-0.70
conduit
-0.68
illes
-0.67
uti
-0.67
aim
-0.65
vend
-0.61
qui
-0.60
POSITIVE LOGITS
DERR
0.84
surprise
0.73
IGHT
0.72
iry
0.67
how
0.67
ãĥ©ãĥ³
0.62
headlines
0.61
Appearance
0.61
seeing
0.59
[*
0.59
Activations Density 0.210%