INDEX
Explanations
incidents of violence and related threats
New Auto-Interp
Negative Logits
resses
-0.14
989
-0.14
ì°©
-0.14
986
-0.14
ÑĨионнÑĭй
-0.14
رÙĪØ¨
-0.13
ç¹Ķ
-0.13
iceps
-0.13
raping
-0.13
à¥Īद
-0.13
POSITIVE LOGITS
spray
0.35
vandalism
0.34
graffiti
0.34
damage
0.33
vandal
0.33
Spray
0.30
gra
0.28
ван
0.28
Damage
0.27
damaged
0.27
Activations Density 0.092%