INDEX
Explanations
references to Jewish cultural or religious identities and communities
New Auto-Interp
Negative Logits
ãĥ³ãĥĦ
-0.16
Slee
-0.15
erc
-0.15
å·»
-0.15
anje
-0.14
iene
-0.14
stell
-0.14
iat
-0.14
Ramadan
-0.14
emarks
-0.14
POSITIVE LOGITS
تÙĦ
0.17
TL
0.14
Michaels
0.14
帯
0.14
帶
0.14
Dro
0.14
dropping
0.13
oming
0.13
ama
0.13
Å
0.13
Activations Density 0.008%