INDEX
Explanations
references to religious or sacred concepts
references to sacrilege or sacrificial themes
New Auto-Interp
Negative Logits
è¦ļéĨĴ
-0.72
Cheong
-0.71
Clarkson
-0.71
ITNESS
-0.67
Turk
-0.65
Imran
-0.65
ITH
-0.64
Pebble
-0.64
XT
-0.63
DERR
-0.62
POSITIVE LOGITS
ificial
1.25
rament
1.18
char
1.16
rifice
1.04
raf
1.00
rum
0.99
anus
0.97
iety
0.96
het
0.96
ral
0.95
Activations Density 0.019%