INDEX
Explanations
terms related to instances of sexual assault
terms related to sexual assault and harassment
New Auto-Interp
Negative Logits
aer
-0.70
Objective
-0.67
ãĤ©
-0.66
snipp
-0.65
IPS
-0.64
ãĥķãĤ©
-0.63
xxxxxxxx
-0.63
newsp
-0.62
peror
-0.62
coat
-0.62
POSITIVE LOGITS
abuse
0.95
rape
0.89
allegations
0.86
slurs
0.83
assment
0.82
inappropriately
0.81
abuse
0.81
perpetrated
0.80
victims
0.80
Sexual
0.79
Activations Density 0.066%