INDEX

Explanations

oral or otherwise

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Erotic

-0.22

 erotic

-0.21

 Erot

-0.20

 masturbating

-0.19

 erotik

-0.18

 sexually

-0.18

 erot

-0.18

 erotica

-0.18

Erot

-0.18

 sexuality

-0.18

POSITIVE LOGITS

 anal

0.19

 penetr

0.17

 missionary

0.17

 oral

0.16

Anal

0.16

vag

0.16

 Anal

0.15

æıĴ

0.15

 Oral

0.14

 penetration

0.14

Activations Density 0.151%