INDEX

Explanations

emphasizing importance or worth

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 conceivable

-0.09

 myst

-0.09

ANNEL

-0.09

Lau

-0.08

 Wilkinson

-0.08

AINS

-0.08

inki

-0.08

ndl

-0.08

.Classes

-0.08

POSITIVE LOGITS

 worth

0.41

 Worth

0.32

worth

0.31

 note

0.18

 important

0.18

 ÑģÑĤÐ¾Ð¸ÑĤ

0.16

 should

0.16

sworth

0.16

 worthy

0.15

 importante

0.14

Activations Density 0.014%