INDEX
Explanations
surprised reactions and discoveries
expressions related to surprise and discovery
New Auto-Interp
Negative Logits
vend
-0.72
hung
-0.69
condu
-0.67
ulton
-0.67
aim
-0.65
burgh
-0.63
bye
-0.62
leaf
-0.61
interrupted
-0.61
stake
-0.60
POSITIVE LOGITS
surprise
0.83
icably
0.69
how
0.67
surprises
0.65
unexpected
0.64
how
0.64
reactions
0.64
cus
0.63
encountering
0.62
surprised
0.61
Activations Density 0.161%