INDEX
Explanations
references to Jewish culture and religion
references to Jewish identity or the term "Jew."
New Auto-Interp
Negative Logits
nda
-0.63
Cortex
-0.59
role
-0.59
open
-0.58
atha
-0.58
drivers
-0.58
RTX
-0.57
doi
-0.57
utic
-0.56
2010
-0.56
POSITIVE LOGITS
Jew
4.28
Jew
2.99
jew
2.17
jew
2.15
Jews
2.10
Jews
1.90
Judaism
1.86
Jewish
1.83
Jewish
1.72
Juda
1.65
Activations Density 0.017%