INDEX

Explanations

support and opposition

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 sore

-0.10

 unanimous

-0.10

acle

-0.10

_strings

-0.09

 dislike

-0.09

 dislikes

-0.09

wagon

-0.09

 Succ

-0.09

 treating

-0.08

 parch

-0.08

POSITIVE LOGITS

 support

0.22

æĶ¯æĮģ

0.21

 supports

0.21

 favor

0.21

 demand

0.20

 favour

0.20

supports

0.18

 advocate

0.18

 Supports

0.18

 Ð²ÑĭÑģÑĤÑĥÐ¿

0.17

Activations Density 0.108%