INDEX
Explanations
words related to negative emotions or situations, particularly fear and hatred
terms associated with fear or negative emotions
New Auto-Interp
Negative Logits
increments
-0.68
tom
-0.65
attm
-0.64
ty
-0.62
load
-0.62
trans
-0.59
album
-0.59
Divide
-0.59
composition
-0.58
Rew
-0.58
POSITIVE LOGITS
feared
3.75
dreaded
1.95
fears
1.70
hated
1.69
fearing
1.68
fear
1.64
despised
1.60
hoped
1.59
disliked
1.53
fearful
1.44
Activations Density 0.011%