INDEX
Explanations
expressions of opinions or feelings by individuals
expressions of emotions, sentiments, and opinions
New Auto-Interp
Negative Logits
helicop
-0.73
oby
-0.70
cade
-0.70
batch
-0.70
bsite
-0.68
prominently
-0.67
oldown
-0.66
Produ
-0.66
Wrest
-0.66
uters
-0.66
POSITIVE LOGITS
desire
1.73
belief
1.70
admiration
1.67
dislike
1.60
feelings
1.51
displeasure
1.50
dissatisfaction
1.49
hatred
1.49
resentment
1.48
frustration
1.43
Activations Density 0.524%