INDEX
Explanations
phrases related to strong disapproval or criticism
instances of condemnation or denouncement of actions or events
New Auto-Interp
Negative Logits
ramid
-0.81
IER
-0.70
membr
-0.68
Wonders
-0.68
emis
-0.66
BLIC
-0.65
seed
-0.63
icle
-0.63
NetMessage
-0.62
aldo
-0.62
POSITIVE LOGITS
condemn
0.90
homophobic
0.83
unequivocally
0.82
ations
0.79
atrocities
0.79
condemning
0.79
harshly
0.75
condemnation
0.75
ifiable
0.75
urous
0.75
Activations Density 0.055%