INDEX
Explanations
personal expressions of opinions or emotions
expressions of personal opinions and emotions
New Auto-Interp
Negative Logits
Triumph
-0.72
Hutch
-0.67
Glover
-0.67
overturned
-0.65
Blooming
-0.65
Goodwin
-0.64
staffed
-0.62
Wiggins
-0.62
Malaysia
-0.60
constructed
-0.59
POSITIVE LOGITS
displeasure
0.88
urances
0.77
rompt
0.77
condolences
0.76
ociation
0.76
Letter
0.75
actionGroup
0.74
superiority
0.73
disapproval
0.73
preference
0.71
Activations Density 0.140%