INDEX
Explanations
phrases related to social media posts and public remarks
instances of public remarks and complaints related to social media controversies
New Auto-Interp
Negative Logits
JV
-0.74
elsen
-0.68
prus
-0.68
asio
-0.67
dilig
-0.67
BSD
-0.67
longitudinal
-0.67
hinge
-0.66
stagn
-0.65
ggles
-0.65
POSITIVE LOGITS
blasp
1.08
inappropriate
1.00
derogatory
0.99
insulting
0.97
nudity
0.94
falsely
0.94
inappropriately
0.94
lewd
0.94
pornographic
0.93
offending
0.91
Activations Density 0.583%