INDEX
Explanations
offensive and derogatory language, including profanity and insults
derogatory terms and insults
New Auto-Interp
Negative Logits
apers
-0.82
mental
-0.79
enza
-0.76
ctica
-0.74
ainted
-0.73
ENC
-0.73
âĹ¼
-0.72
undai
-0.72
ORN
-0.71
inguished
-0.71
POSITIVE LOGITS
bitch
1.33
buster
1.01
fuck
0.96
asses
0.96
hole
0.95
holes
0.91
cunt
0.88
bastard
0.86
whore
0.84
bast
0.81
Activations Density 0.008%