INDEX
Explanations
terms related to sexual violence and harassment
references to sexual misconduct and related terms
New Auto-Interp
Negative Logits
Dispatch
-0.90
ALS
-0.77
ICA
-0.76
iard
-0.76
Breaker
-0.75
GV
-0.75
IVERS
-0.72
tower
-0.72
REC
-0.71
Glob
-0.69
POSITIVE LOGITS
intercourse
1.16
assault
1.02
ity
0.99
ized
0.97
ization
0.93
ensl
0.91
assaults
0.90
harassment
0.90
ities
0.89
misconduct
0.89
Activations Density 0.023%