INDEX

Explanations

DAN can do anything; character performs

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

kee

-0.13

linger

-0.09

ellig

-0.09

 Morr

-0.08

paque

-0.08

 pref

-0.08

 still

-0.08

nist

-0.08

antal

-0.08

POSITIVE LOGITS

CAN

0.10

 circum

0.10

 indeed

0.09

_CAN

0.09

 doing

0.09

 Doing

0.09

 overcome

0.09

åģļ

0.09

 {\\n

0.09

can

0.09

Activations Density 0.012%