INDEX
Explanations
mentions of different nationalities or political affiliations
references to specific nationalities and ethnicities, particularly in the context of American and Middle Eastern identities
New Auto-Interp
Negative Logits
ologies
-1.03
bows
-1.02
ravings
-1.02
ories
-1.00
rils
-0.92
acies
-0.91
abilities
-0.91
encies
-0.90
onies
-0.88
isites
-0.87
POSITIVE LOGITS
who
1.09
citizen
1.07
politician
1.03
woman
1.01
colleague
1.00
traveler
1.00
thinker
0.98
traveller
0.96
journalist
0.95
believer
0.94
Activations Density 0.260%