INDEX
Explanations
mentions of religious or spiritual figures, particularly priests
New Auto-Interp
Negative Logits
aines
-0.17
avage
-0.16
cream
-0.15
zin
-0.14
icz
-0.14
acre
-0.14
cre
-0.14
磨
-0.14
wend
-0.14
align
-0.14
POSITIVE LOGITS
ests
0.19
Pri
0.19
anka
0.17
iminary
0.17
ory
0.17
klad
0.17
incess
0.17
ilege
0.16
ories
0.16
оÑĢи
0.16
Activations Density 0.007%