INDEX
Explanations
references to various forms of violence and their impact on individuals and communities
New Auto-Interp
Negative Logits
bjerg
-0.17
cheid
-0.17
alle
-0.16
lington
-0.16
bsd
-0.15
loth
-0.14
rov
-0.14
sWith
-0.14
ạo
-0.14
lÃŃ
-0.14
POSITIVE LOGITS
levard
0.16
inch
0.15
pu
0.14
unken
0.14
ink
0.14
/↵
0.14
Singles
0.14
locked
0.14
firm
0.13
275
0.13
Activations Density 0.012%