INDEX
Explanations
references to Jewish identity, community, and institutions
New Auto-Interp
Negative Logits
Lutheran
-0.16
Anglic
-0.15
xic
-0.15
hem
-0.15
tings
-0.15
utter
-0.15
nici
-0.14
umber
-0.14
635
-0.14
à¸Īà¸Ļ
-0.14
POSITIVE LOGITS
-Christian
0.20
ness
0.20
-Owned
0.20
-owned
0.18
enco
0.17
/non
0.17
-American
0.16
esses
0.15
#
0.15
576
0.15
Activations Density 0.026%