INDEX
Explanations
words related to condemning or showing disapproval
New Auto-Interp
Negative Logits
ramid
-0.76
IER
-0.69
emis
-0.68
lio
-0.67
neau
-0.66
BLIC
-0.65
membr
-0.65
OTE
-0.64
NetMessage
-0.64
seed
-0.63
POSITIVE LOGITS
condemn
0.97
condemning
0.84
homophobic
0.82
harshly
0.81
racism
0.81
unequivocally
0.80
unres
0.79
condemnation
0.78
ations
0.77
atrocities
0.77
Activations Density 0.044%