INDEX
Explanations
words related to significant negative reactions or emotions
words associated with negative emotional reactions or social unrest
New Auto-Interp
Negative Logits
carrot
-0.65
fucked
-0.61
ĪĴ
-0.60
©¶æ
-0.60
violet
-0.59
compliment
-0.56
fing
-0.56
alphabet
-0.56
corpse
-0.53
cknow
-0.52
POSITIVE LOGITS
amongst
1.01
among
0.99
among
0.86
lees
0.74
akin
0.71
elsewhere
0.70
throughout
0.70
domestically
0.69
internationally
0.69
across
0.68
Activations Density 0.283%