INDEX
Explanations
terms related to religious affiliations, specifically focusing on Mormonism and Presbyterianism
New Auto-Interp
Negative Logits
ord
-0.17
lessly
-0.16
olar
-0.15
ola
-0.14
anki
-0.14
Mens
-0.14
g
-0.14
ary
-0.13
on
-0.13
ec
-0.13
POSITIVE LOGITS
Hao
0.16
hm
0.15
¦¬
0.15
ipple
0.15
uchos
0.15
ãĥ§
0.14
lien
0.14
oser
0.14
.hm
0.14
Blast
0.14
Activations Density 0.008%