INDEX
Explanations
words related to negative emotions like hatred and distrust
New Auto-Interp
Negative Logits
aqu
-0.64
inventoryQuantity
-0.59
annis
-0.55
pletion
-0.55
umm
-0.55
orage
-0.54
helicop
-0.54
Horizon
-0.54
atonin
-0.54
changes
-0.52
POSITIVE LOGITS
hated
0.66
lessly
0.65
hatred
0.62
toward
0.61
prejudice
0.61
vengeance
0.61
ãĥĨ
0.61
towards
0.60
¿½
0.58
fully
0.58
Activations Density 8.367%