INDEX

Explanations

options from the given

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 insets

-0.10

 imperson

-0.09

afone

-0.09

erse

-0.09

Tam

-0.09

gebn

-0.08

inois

-0.08

ufe

-0.08

 inne

-0.08

POSITIVE LOGITS

 letters

0.27

 Letters

0.22

Letters

0.21

 choices

0.21

letters

0.21

 options

0.17

choices

0.15

 letter

0.14

 Choices

0.14

 choice

0.14

Activations Density 0.035%