INDEX
Explanations
mentions of specific groups or categories of people
New Auto-Interp
Negative Logits
Nut
-0.71
æ©Ł
-0.71
Deal
-0.70
pmwiki
-0.69
Inspection
-0.68
Accessory
-0.67
uner
-0.66
quickShipAvailable
-0.66
tun
-0.66
Completed
-0.64
POSITIVE LOGITS
hood
1.01
queer
0.86
patriarchy
0.85
paces
0.83
bisexual
0.81
everywhere
0.81
pronouns
0.81
feminism
0.76
genital
0.76
lesbian
0.75
Activations Density 0.127%