INDEX
Explanations
references to explicit and adult-themed content
New Auto-Interp
Negative Logits
anse
-0.16
aukee
-0.15
ivre
-0.15
ÑĪин
-0.15
.dist
-0.14
dge
-0.14
alta
-0.14
masked
-0.14
Kahn
-0.14
atomic
-0.14
POSITIVE LOGITS
Ĥ
0.15
-hole
0.15
-boy
0.15
/rem
0.14
CRET
0.14
naughty
0.14
chten
0.14
ยà¸ĩ
0.14
coup
0.14
cream
0.13
Activations Density 0.019%