INDEX
Explanations
references to nudity or being naked
New Auto-Interp
Negative Logits
fried
-0.16
kle
-0.15
erras
-0.15
Ìĥ
-0.15
rides
-0.15
azes
-0.14
agal
-0.14
ầm
-0.14
eprom
-0.14
skirts
-0.14
POSITIVE LOGITS
/raw
0.23
bare
0.22
Naked
0.21
naked
0.20
bare
0.20
/null
0.18
revealed
0.17
.githubusercontent
0.17
bones
0.17
ness
0.17
Activations Density 0.019%