INDEX
Explanations
derogatory comments and remarks made online
instances of derogatory or insulting comments and expressions
New Auto-Interp
Negative Logits
rebuild
-0.82
verning
-0.81
rebuilding
-0.78
glim
-0.77
rebuilt
-0.74
taxed
-0.72
stabilize
-0.72
quartered
-0.71
ensable
-0.69
workings
-0.68
POSITIVE LOGITS
Redditor
1.15
1.04
retweet
1.03
tumblr
0.98
TMZ
0.97
tweets
0.96
Tumblr
0.95
selfies
0.95
derogatory
0.93
misogyn
0.93
Activations Density 0.647%