INDEX
Explanations
insulting words or phrases
derogatory terms or insults directed at individuals
New Auto-Interp
Negative Logits
anecd
-0.80
proponents
-0.79
advocates
-0.74
often
-0.74
collectively
-0.73
Cosponsors
-0.72
campaigns
-0.72
requently
-0.71
Develop
-0.70
anwhile
-0.70
POSITIVE LOGITS
whore
1.33
fuckin
1.31
fucking
1.26
girl
1.25
cunt
1.24
pussy
1.24
bitch
1.23
dick
1.20
goddamn
1.18
guy
1.17
Activations Density 0.547%