INDEX
Explanations
killing and death
death and killing
New Auto-Interp
Negative Logits
ס
0.92
CB
0.85
са
0.85
ای
0.85
TION
0.84
ي
0.83
CAT
0.81
TMP
0.80
T
0.80
است
0.80
POSITIVE LOGITS
at
0.86
死的
0.79
by
0.71
deaths
0.70
death
0.69
kill
0.69
I
0.68
killings
0.68
muerte
0.66
arme
0.65
Activations Density 0.669%