INDEX
Explanations
vulgar and offensive terms
references to sexual or vulgar terms
New Auto-Interp
Negative Logits
iation
-0.83
othy
-0.75
ril
-0.75
aneously
-0.75
GRE
-0.74
VERTISEMENT
-0.73
ORGE
-0.68
reek
-0.67
OR
-0.66
oS
-0.66
POSITIVE LOGITS
ussy
0.97
ignt
0.82
panties
0.80
holes
0.79
cat
0.79
hole
0.79
Riot
0.78
essee
0.77
chet
0.77
lips
0.76
Activations Density 0.011%