INDEX
Explanations
events related to violence or threats against specific communities or individuals
New Auto-Interp
Negative Logits
iceps
-0.14
989
-0.14
殺
-0.14
æĿĢ
-0.14
deadliest
-0.14
enci
-0.14
ect
-0.14
assassin
-0.13
986
-0.13
killers
-0.13
POSITIVE LOGITS
vandal
0.34
spray
0.33
vandalism
0.32
damage
0.31
gra
0.30
graffiti
0.29
ван
0.28
Spray
0.28
graf
0.26
damaged
0.24
Activations Density 0.082%