INDEX
Explanations
strong emotional reactions such as shock, surprise, or amazement
expressions of surprise or shock
New Auto-Interp
Negative Logits
forearm
-0.65
uben
-0.65
omo
-0.64
slips
-0.64
porary
-0.63
illusion
-0.62
extracted
-0.61
stake
-0.61
vil
-0.60
redistributed
-0.59
POSITIVE LOGITS
ingly
1.10
onlook
1.04
us
0.94
me
0.93
him
0.86
investigators
0.77
udic
0.77
audiences
0.76
commenters
0.75
readers
0.73
Activations Density 0.167%