INDEX
Explanations
references to sacrificial practices or related concepts
terms associated with sacrilege or sacrificial concepts
New Auto-Interp
Negative Logits
ITNESS
-0.80
Cheong
-0.73
è¦ļéĨĴ
-0.71
agher
-0.70
Turk
-0.66
Pebble
-0.65
Nanto
-0.64
Clarkson
-0.63
NING
-0.63
ITH
-0.63
POSITIVE LOGITS
rament
1.19
ificial
1.18
rifice
1.11
char
1.09
het
0.99
iety
0.96
rum
0.92
ré
0.91
ros
0.91
sac
0.90
Activations Density 0.008%