INDEX
Explanations
expressions of strong emotions or reactions, especially negative ones
New Auto-Interp
Negative Logits
imum
-0.91
Formation
-0.86
atari
-0.79
aceutical
-0.77
itivity
-0.73
omi
-0.73
ulton
-0.72
ancial
-0.69
Attribution
-0.68
Virtue
-0.68
POSITIVE LOGITS
ashamed
1.59
humiliated
1.48
frustrated
1.47
disgusted
1.45
saddened
1.44
embarrassed
1.43
depressed
1.42
afraid
1.39
powerless
1.39
confused
1.39
Activations Density 0.214%