INDEX
Explanations
phrases related to emotions, particularly relief and elation
expressions of relief or positive emotional responses
New Auto-Interp
Negative Logits
oln
-0.70
lay
-0.66
inates
-0.65
rf
-0.64
eworks
-0.63
orig
-0.63
neau
-0.62
inances
-0.61
waivers
-0.60
method
-0.60
POSITIVE LOGITS
exclaim
0.85
anticipation
0.82
awaiting
0.79
disbelief
0.78
wondering
0.78
indignation
0.77
recalling
0.77
ashamed
0.74
agony
0.74
exhaustion
0.73
Activations Density 0.277%