INDEX
Explanations
violence-related actions, particularly physical attacks
instances of physical violence or assault
New Auto-Interp
Negative Logits
ãĥĩãĤ£
-0.83
DragonMagazine
-0.81
iterranean
-0.75
imester
-0.74
gio
-0.74
arcity
-0.73
inventoryQuantity
-0.71
CLASS
-0.71
encia
-0.71
lopp
-0.70
POSITIVE LOGITS
senseless
0.95
buttocks
0.94
unconscious
0.91
classmate
0.87
rapist
0.81
girlfriend
0.80
asshole
0.77
unarmed
0.77
innocent
0.77
violently
0.77
Activations Density 0.338%