INDEX
Explanations
warning messages in text
warnings or notifications regarding inappropriate or graphic content
New Auto-Interp
Negative Logits
inertia
-0.79
--+
-0.74
Recovery
-0.73
staggered
-0.72
buck
-0.71
retire
-0.69
waiting
-0.69
town
-0.68
swoop
-0.68
patiently
-0.66
POSITIVE LOGITS
pornographic
1.63
nudity
1.58
objectionable
1.37
depictions
1.33
depicting
1.32
satire
1.30
derogatory
1.26
lewd
1.25
misogyn
1.24
blasp
1.22
Activations Density 0.530%