INDEX
Explanations
actions related to confrontation and incapacitation in high-stakes scenarios
New Auto-Interp
Negative Logits
สร
-0.16
assin
-0.15
762
-0.15
042
-0.15
oted
-0.14
jud
-0.14
å¡
-0.14
peare
-0.14
etre
-0.14
anza
-0.14
POSITIVE LOGITS
ownt
0.19
kses
0.16
ëŀį
0.15
ông
0.15
engu
0.15
conti
0.15
exus
0.14
/lg
0.14
ochrome
0.14
uda
0.14
Activations Density 0.143%