INDEX
Explanations
references to sexual assault
references to sexual assault and related crimes
New Auto-Interp
Negative Logits
issue
-0.81
clips
-0.81
immune
-0.75
issues
-0.75
hire
-0.72
etheless
-0.72
rek
-0.72
edition
-0.70
lore
-0.70
fare
-0.69
POSITIVE LOGITS
him
0.87
them
0.73
unsuspecting
0.72
prostitutes
0.72
himself
0.70
innoc
0.67
raping
0.67
Samantha
0.65
innocent
0.65
aggressively
0.65
Activations Density 0.083%