INDEX
Explanations
references to Jewish identity and related terms
New Auto-Interp
Negative Logits
als
-0.16
kers
-0.16
ker
-0.15
Islam
-0.15
ument
-0.15
ative
-0.15
urs
-0.15
ode
-0.15
Pett
-0.14
isi
-0.14
POSITIVE LOGITS
-Owned
0.19
enco
0.18
-Christian
0.18
uales
0.16
ness
0.16
/non
0.15
anken
0.15
ÃŃky
0.14
otionEvent
0.14
ABCDEFGHIJKLMNOP
0.14
Activations Density 0.029%