INDEX
Explanations
content related to adult themes and ratings in media
New Auto-Interp
Negative Logits
RAIN
-0.14
rael
-0.14
fucked
-0.14
cunt
-0.14
.semantic
-0.13
pornografia
-0.13
ixer
-0.13
_fixture
-0.13
asma
-0.13
öl
-0.13
POSITIVE LOGITS
prof
0.20
brief
0.20
references
0.19
strong
0.19
references
0.18
thematic
0.17
nudity
0.17
heavy
0.16
language
0.16
drug
0.16
Activations Density 0.007%