INDEX

Explanations

`be one of [option list]`

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Gast

-0.09

Dak

-0.09

preh

-0.09

ath

-0.09

aff

-0.09

 warning

-0.09

zer

-0.09

 Kansas

-0.08

edom

-0.08

oub

-0.08

POSITIVE LOGITS

<typeof

0.12

ä¹ĭä¸Ģ

0.11

 verb

0.11

 either

0.10

utenberg

0.10

either

0.10

 actions

0.10

 Twist

0.09

oji

0.09

 ones

0.09

Activations Density 0.043%