INDEX
Explanations
phrases related to expressing strong disapproval or criticism
expressions of condemnation or disapproval
New Auto-Interp
Negative Logits
ramid
-0.89
Wonders
-0.73
Solitaire
-0.70
impro
-0.69
iago
-0.68
negie
-0.67
membr
-0.66
chn
-0.64
aldo
-0.64
hack
-0.63
POSITIVE LOGITS
condemn
0.92
condemning
0.87
urous
0.80
harshly
0.78
condemnation
0.77
condemns
0.77
ations
0.76
unres
0.75
homophobic
0.74
denounce
0.72
Activations Density 0.039%