INDEX
Explanations
words related to physical violence or harm inflicted on individuals by others
terms related to various forms of violence and assault
New Auto-Interp
Negative Logits
tions
-0.73
istries
-0.72
alities
-0.68
FK
-0.67
antz
-0.66
negie
-0.64
formation
-0.64
rium
-0.63
Zone
-0.62
issue
-0.61
POSITIVE LOGITS
by
1.02
merciless
0.91
nikov
0.79
aback
0.78
repeatedly
0.76
BY
0.75
by
0.74
anew
0.74
relentlessly
0.73
violently
0.73
Activations Density 0.160%