INDEX
Explanations
references to Jewish people and related terms that may indicate stereotypes or negative sentiments
New Auto-Interp
Negative Logits
InputDecoration
-0.44
paraître
-0.44
itoare
-0.39
AppColors
-0.38
InputBorder
-0.37
szól
-0.36
jScrollPane
-0.35
Unterscheidung
-0.35
rungsseite
-0.35
composición
-0.35
POSITIVE LOGITS
Jewish
0.69
Jews
0.69
Jew
0.69
Jewish
0.68
Hebrew
0.64
Jews
0.63
jewish
0.62
Judaism
0.61
Tear
0.60
Datuak
0.60
Activations Density 1.899%