INDEX
Explanations
references to murder and conspiracy-related events
New Auto-Interp
Negative Logits
Offensive
-0.16
anke
-0.16
Thief
-0.15
à¸Ĺาà¸Ļ
-0.14
%-
-0.14
Disaster
-0.14
confiscated
-0.14
ertino
-0.13
üz
-0.13
Cata
-0.13
POSITIVE LOGITS
killing
0.59
kill
0.56
kill
0.53
murder
0.52
kills
0.50
killings
0.50
killed
0.49
mur
0.48
Kill
0.47
Kill
0.47
Activations Density 0.412%