INDEX
Explanations
internet-related content with potential legal or offensive implications, specifically related to social media comments or online behavior
New Auto-Interp
Negative Logits
urion
-0.77
cells
-0.75
ongevity
-0.73
endurance
-0.71
asio
-0.70
Ports
-0.69
Endurance
-0.69
byss
-0.69
ulton
-0.69
regeneration
-0.68
POSITIVE LOGITS
derogatory
1.34
pornographic
1.30
blasp
1.30
insulting
1.25
objectionable
1.23
lewd
1.20
slurs
1.18
misogyn
1.17
hateful
1.16
swast
1.15
Activations Density 0.746%