INDEX
Explanations
references to killing or murder
New Auto-Interp
Negative Logits
orial
-0.15
esta
-0.14
olina
-0.14
aux
-0.14
cred
-0.14
omaly
-0.14
Jord
-0.14
hollow
-0.13
laÅŁ
-0.13
acles
-0.13
POSITIVE LOGITS
ambi
0.17
abyrin
0.17
/renderer
0.15
iani
0.15
ouser
0.15
/goto
0.15
ustr
0.15
IVEN
0.15
ifestyles
0.14
995
0.14
Activations Density 0.029%