INDEX
Explanations
terms related to innocence and victims
New Auto-Interp
Negative Logits
lfw
-0.17
phan
-0.17
ilet
-0.16
ongan
-0.15
illisecond
-0.15
688
-0.15
ionic
-0.15
yte
-0.14
alog
-0.14
imli
-0.14
POSITIVE LOGITS
innocent
0.27
bystand
0.25
innocence
0.24
innoc
0.24
Innoc
0.23
harmless
0.19
civilians
0.18
/simple
0.17
victims
0.16
-looking
0.16
Activations Density 0.011%