INDEX
Explanations
phrases related to strong disapproval or criticism
instances of condemnation related to various actions or events
New Auto-Interp
Negative Logits
OTE
-0.73
IER
-0.69
soDeliveryDate
-0.66
aldo
-0.66
timer
-0.65
Surviv
-0.64
icle
-0.64
llular
-0.63
enture
-0.63
emis
-0.63
POSITIVE LOGITS
abuses
0.99
atrocities
0.95
racism
0.93
vandalism
0.91
bigotry
0.88
hypocrisy
0.88
abuse
0.88
enance
0.86
violence
0.84
actions
0.84
Activations Density 0.083%