INDEX
Explanations
offensive or inappropriate content
New Auto-Interp
Negative Logits
bakalım
0.84
saver
0.71
unbeaten
0.69
спери
0.67
दिलचस्प
0.67
pherd
0.67
꒪
0.66
দর
0.65
Scissors
0.65
PickerController
0.64
POSITIVE LOGITS
sexual
1.69
sexually
1.68
offensive
1.68
content
1.56
depictions
1.55
vulgar
1.52
hateful
1.52
derogatory
1.51
inappropriate
1.49
misog
1.49
Activations Density 2.982%