INDEX
Explanations
phrases related to killing and violence
New Auto-Interp
Negative Logits
esto
-0.16
å±
-0.16
igo
-0.15
orial
-0.15
ted
-0.14
906
-0.14
acles
-0.14
ential
-0.14
838
-0.14
olina
-0.14
POSITIVE LOGITS
ábado
0.19
throp
0.15
abyrin
0.14
icie
0.14
Dim
0.13
iciel
0.13
ifestyles
0.13
Jennings
0.13
ourg
0.13
plotlib
0.13
Activations Density 0.040%