INDEX
Explanations
words related to specific ethnic and religious groups
references to specific national or ethnic identities, particularly focusing on Muslims, Palestinians, and Americans
New Auto-Interp
Negative Logits
ologies
-1.04
abilities
-0.94
ories
-0.88
arters
-0.87
sections
-0.87
rings
-0.85
ravings
-0.85
irens
-0.84
suites
-0.84
events
-0.84
POSITIVE LOGITS
citizen
1.08
woman
1.04
politician
1.04
colleague
1.02
journalist
1.00
who
1.00
diplomat
0.99
teenager
0.98
businessman
0.95
prostitute
0.95
Activations Density 0.215%