INDEX

Explanations

task initiation verbs

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 ours

-0.09

wer

-0.09

blob

-0.08

unter

-0.08

/shared

-0.08

ialized

-0.08

aved

-0.08

Wer

-0.08

 certain

-0.07

Ma

-0.07

POSITIVE LOGITS

 your

0.15

 yourself

0.12

/Create

0.12

ä½łçļĦ

0.11

your

0.11

/create

0.10

 Your

0.10

IIIK

0.10

æĤ¨çļĦ

0.10

HeaderCode

0.10

Activations Density 0.170%