INDEX
Explanations
instances where someone is surprised
expressions of surprise
New Auto-Interp
Negative Logits
alach
-0.82
ngth
-0.76
href
-0.74
obe
-0.70
bern
-0.70
utf
-0.70
ciplinary
-0.70
itte
-0.70
iffe
-0.69
tein
-0.69
POSITIVE LOGITS
enough
0.77
how
0.76
aback
0.73
Squid
0.70
ãĤ¦ãĤ¹
0.69
Pew
0.69
cules
0.69
Howell
0.65
Robin
0.64
090
0.64
Activations Density 0.036%