INDEX
Explanations
mentions of specific types or groups of people within certain communities
references to social or political groups
New Auto-Interp
Negative Logits
iour
-0.70
Delivery
-0.68
natureconservancy
-0.65
amaz
-0.63
Donation
-0.62
giveaway
-0.62
inated
-0.60
inse
-0.59
å½
-0.58
imm
-0.58
POSITIVE LOGITS
pace
1.03
creen
0.98
folk
0.93
ģĸ
0.89
circles
0.87
chool
0.87
hare
0.86
wagon
0.85
peak
0.84
hift
0.83
Activations Density 0.036%