INDEX
Explanations
words related to criticism or negative comments
terms associated with derogatory or disparaging language
New Auto-Interp
Negative Logits
ramid
-0.81
ħĭ
-0.80
RH
-0.75
hner
-0.71
GOODMAN
-0.70
*/(
-0.68
HCR
-0.67
eanor
-0.66
ource
-0.66
metic
-0.66
POSITIVE LOGITS
dispar
1.00
ately
0.93
agement
0.91
ously
0.84
aging
0.79
Vaugh
0.78
uously
0.77
ãĥ¥
0.70
ached
0.70
leveled
0.70
Activations Density 0.012%