INDEX
Explanations
mentions of Jewish identity or related terms
Jew or Jewish
New Auto-Interp
Negative Logits
Eriksson
-0.64
Driscoll
-0.61
Lombardi
-0.61
Erik
-0.60
Erik
-0.59
Arias
-0.59
Doran
-0.57
Talbot
-0.56
Rivas
-0.56
Jansen
-0.55
POSITIVE LOGITS
Jew
2.05
Jew
1.85
Jews
1.58
jew
1.48
Jews
1.41
jew
1.28
Juifs
1.25
Jewish
1.06
judíos
1.05
Jewish
1.03
Activations Density 0.005%