INDEX

Explanations

emphasize or clarify that

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ëį°ìĿ´íĬ¸

-0.10

ì£½

-0.09

 refr

-0.09

tha

-0.09

;break

-0.09

 proc

-0.09

ih

-0.08

fri

-0.08

ieval

-0.08

 conf

-0.08

POSITIVE LOGITS

 bahwa

0.22

 ráº±ng

0.20

 that

0.15

 again

0.14

 Å¼e

0.13

that

0.13

 ÏĮÏĦÎ¹

0.13

 dass

0.12

again

0.11

 ettÃ¤

0.11

Activations Density 0.042%