INDEX
Explanations
text fragments expressing emotions, reactions, or personal evaluations
expressions of personal emotions and feelings
New Auto-Interp
Negative Logits
aceutical
-0.64
STA
-0.61
othing
-0.61
Pigs
-0.61
bombshell
-0.60
rising
-0.59
TEXT
-0.58
Times
-0.58
ufact
-0.57
earchers
-0.57
POSITIVE LOGITS
ineligible
0.84
feel
0.77
uneasy
0.73
eligible
0.73
appealing
0.72
susceptible
0.72
safer
0.70
seem
0.69
hesitate
0.69
reconsider
0.69
Activations Density 0.086%