INDEX
Explanations
instances where the text conveys surprise or the potential for surprise
instances of the word "surprised"
New Auto-Interp
Negative Logits
©¶æ
-0.81
ciplinary
-0.78
amins
-0.74
ngth
-0.72
tein
-0.71
uria
-0.71
atra
-0.71
ignty
-0.70
vertisement
-0.70
hid
-0.70
POSITIVE LOGITS
Squid
0.82
onlook
0.74
terson
0.72
Surprise
0.72
NESS
0.71
imaru
0.69
silence
0.69
surprised
0.67
enough
0.66
Jol
0.65
Activations Density 0.014%