INDEX
Explanations
negative sentiments or actions directed towards specific groups or individuals
words reflecting strong emotions or actions related to negativity and conflict
New Auto-Interp
Negative Logits
conclud
-0.61
enegger
-0.60
pherd
-0.59
Carbuncle
-0.59
uana
-0.55
allery
-0.53
Ern
-0.52
Ħ¢
-0.52
leon
-0.51
rul
-0.51
POSITIVE LOGITS
pes
0.64
ickets
0.57
isively
0.56
heartedly
0.55
bugs
0.55
BT
0.53
Ts
0.52
bryce
0.52
ans
0.51
actively
0.51
Activations Density 0.706%