INDEX
Explanations
mentions of innocence and associated concepts
mentions of innocent individuals or groups
New Auto-Interp
Negative Logits
division
-0.74
TOP
-0.74
sych
-0.73
ANN
-0.72
artney
-0.71
jri
-0.71
riber
-0.69
ingo
-0.69
pain
-0.69
è¦ļéĨĴ
-0.68
POSITIVE LOGITS
bystand
1.37
bystanders
1.29
innocent
1.10
innocence
0.91
civilians
0.84
victims
0.84
prey
0.81
ocent
0.81
innoc
0.77
ously
0.73
Activations Density 0.017%