INDEX
Explanations
phrases conveying surprise
expressions of surprise
New Auto-Interp
Negative Logits
ciplinary
-0.84
amins
-0.74
vertisement
-0.73
haar
-0.73
utf
-0.71
ngth
-0.70
alach
-0.69
ueller
-0.69
interstitial
-0.69
ignty
-0.69
POSITIVE LOGITS
Squid
0.79
imaru
0.74
Pew
0.72
vale
0.70
cules
0.70
HuffPost
0.68
bystanders
0.66
onlook
0.66
silence
0.65
Gry
0.65
Activations Density 0.023%