INDEX
Explanations
offensive and derogatory terms
terms related to sexuality and derogatory language
New Auto-Interp
Negative Logits
ulhu
-0.84
actic
-0.81
scl
-0.79
owan
-0.79
sonian
-0.77
son
-0.74
Flavoring
-0.73
ointment
-0.73
ERG
-0.72
acter
-0.71
POSITIVE LOGITS
panties
0.94
nuns
0.80
pussy
0.78
lips
0.77
vagina
0.76
Melania
0.76
breasts
0.74
Lucia
0.73
boobs
0.73
Riot
0.72
Activations Density 0.054%