INDEX
Explanations
occurrences of physical violence and assault
instances of violence or physical harm
New Auto-Interp
Negative Logits
Labor
-0.71
ethic
-0.64
istries
-0.64
FK
-0.63
alities
-0.63
hang
-0.62
lore
-0.61
ARE
-0.60
Us
-0.59
formation
-0.59
POSITIVE LOGITS
ĸļ
0.94
aback
0.91
by
0.80
hostage
0.79
merciless
0.78
unintention
0.71
quished
0.71
Sapphire
0.70
ô
0.70
unfairly
0.69
Activations Density 0.208%