INDEX

Explanations

naming and description

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 correctness

-0.09

okable

-0.09

 '\''

-0.09

redirectTo

-0.09

 correct

-0.09

 doÄŁru

-0.09

 Vert

-0.09

Cow

-0.08

conte

-0.08

gnu

-0.08

POSITIVE LOGITS

 appropriate

0.17

 unique

0.17

 suitable

0.17

 meaningful

0.16

éģ©

0.16

unique

0.15

ä»»

0.14

 arbit

0.14

 desired

0.13

appropriate

0.13

Activations Density 0.108%