INDEX
Explanations
verbs or adjectives related to causing harm or negative impact
words related to negative impacts or harm inflicted on individuals or groups
New Auto-Interp
Negative Logits
USS
-0.77
NAACP
-0.57
boarded
-0.56
ipe
-0.56
shit
-0.54
ussen
-0.52
icipated
-0.51
POSE
-0.51
Gree
-0.50
aughed
-0.49
POSITIVE LOGITS
by
1.33
therein
1.09
herein
1.00
during
0.94
BY
0.94
by
0.94
during
0.82
By
0.81
thereto
0.80
upon
0.79
Activations Density 0.181%