INDEX
Explanations
references to different religious and cultural identities
references to religious and political identities, particularly focused on Muslims and Republicans
New Auto-Interp
Negative Logits
Canaver
-0.61
uador
-0.60
Guan
-0.57
]'
-0.55
Salv
-0.53
KER
-0.53
GOODMAN
-0.53
Rohing
-0.52
oneself
-0.52
extrad
-0.52
POSITIVE LOGITS
counterparts
1.07
counterpart
0.94
brethren
0.84
selves
0.80
cousins
0.79
buddies
0.77
holdings
0.75
arsenal
0.74
cousin
0.74
ancestors
0.72
Activations Density 0.801%