INDEX
Explanations
references to killing and murder
New Auto-Interp
Negative Logits
iNdEx
-0.76
متحده
-0.76
Himo
-0.73
>",
-0.72
wireType
-0.68
theless
-0.67
อิง
-0.65
cheme
-0.64
Manus
-0.64
arabe
-0.63
POSITIVE LOGITS
kill
3.04
kills
2.83
killing
2.73
Kill
2.71
KILL
2.68
kill
2.68
killed
2.67
Kill
2.50
Kills
2.40
KILL
2.35
Activations Density 0.053%