INDEX
Explanations
descriptions of violent or harmful actions perpetrated against individuals
occurrences of distressing or violent events related to individuals
New Auto-Interp
Negative Logits
isSpecialOrderable
-1.04
dividends
-0.84
univers
-0.78
Remastered
-0.75
successors
-0.74
optim
-0.70
bedrock
-0.70
provinces
-0.69
Balanced
-0.69
coales
-0.68
POSITIVE LOGITS
raped
1.11
raped
1.11
masturb
1.07
raping
1.04
grop
1.04
handcuffed
1.03
sexually
1.01
stabbed
0.97
nude
0.96
assaulted
0.96
Activations Density 0.567%