INDEX
Explanations
references to killing or death
New Auto-Interp
Negative Logits
Altman
-0.82
Himo
-0.81
theless
-0.77
iNdEx
-0.76
Folks
-0.69
dized
-0.67
Applicant
-0.67
متحده
-0.65
>",
-0.65
Manly
-0.64
POSITIVE LOGITS
kill
2.03
kills
1.94
Kill
1.93
KILL
1.93
kill
1.85
killing
1.80
killed
1.72
Kill
1.71
Kills
1.71
KILL
1.69
Activations Density 0.049%