INDEX
Explanations
strong negative sentiments or expressions of disapproval towards certain topics or individuals
sentiments of dislike or distrust towards people, topics, or situations
New Auto-Interp
Negative Logits
Olymp
-0.74
ortun
-0.73
itialized
-0.72
timing
-0.71
lymp
-0.71
timely
-0.69
Plex
-0.67
motor
-0.67
cod
-0.67
lucky
-0.65
POSITIVE LOGITS
dislike
2.53
disliked
2.42
despise
2.03
disdain
1.99
distrust
1.95
disapprove
1.94
despised
1.94
scorn
1.90
hated
1.88
disapproval
1.77
Activations Density 0.049%