INDEX
Explanations
phrases or words related to negative sentiments and biases
phrases or terms related to anti-Semitic sentiments and actions
New Auto-Interp
Negative Logits
scratch
-0.89
laure
-0.80
fold
-0.76
precincts
-0.74
curled
-0.74
gorge
-0.74
thumbnail
-0.72
dots
-0.72
Brus
-0.70
dashed
-0.70
POSITIVE LOGITS
Semitic
1.81
Semitism
1.69
immigrant
1.62
establishment
1.57
government
1.57
democratic
1.55
choice
1.53
capitalist
1.52
abortion
1.51
gay
1.51
Activations Density 0.033%