INDEX
Explanations
mentions of the word "sacred"
references to sacredness or sacrificial themes
New Auto-Interp
Negative Logits
ITNESS
-0.77
Cheong
-0.76
Turk
-0.66
NING
-0.65
ITH
-0.65
Imran
-0.64
expectancy
-0.64
agher
-0.64
è¦ļéĨĴ
-0.63
DERR
-0.63
POSITIVE LOGITS
ificial
1.29
rament
1.19
char
1.12
rum
1.05
anus
1.04
het
1.01
raf
1.00
ram
0.97
ros
0.95
ral
0.95
Activations Density 0.015%