INDEX
Explanations
references to physical violence and assault
references to acts of violence or abuse against individuals
New Auto-Interp
Negative Logits
Siege
-0.75
Jump
-0.66
NetMessage
-0.65
Agriculture
-0.62
inelli
-0.62
heny
-0.61
icion
-0.61
Sund
-0.61
shaw
-0.60
vine
-0.60
POSITIVE LOGITS
selves
1.05
self
0.89
unconscious
0.85
tremend
0.85
atic
0.84
alian
0.80
atically
0.78
uncond
0.75
harmless
0.74
aggress
0.73
Activations Density 0.284%