INDEX
Explanations
references to beliefs in religious concepts, particularly heaven and hell, among specific demographic groups
New Auto-Interp
Negative Logits
Al
-0.15
Imm
-0.15
Pur
-0.15
Russ
-0.15
869
-0.14
Sist
-0.14
dual
-0.14
Allow
-0.14
N
-0.13
joint
-0.13
POSITIVE LOGITS
uard
0.19
Ñıк
0.17
adar
0.15
_NC
0.15
éº
0.15
ãĥ«ãĤ¯
0.15
agem
0.15
_manifest
0.15
виÑħ
0.14
uhan
0.14
Activations Density 0.002%