INDEX

Explanations

such as

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ÐĲÑĢÑħÑĸÐ²

-0.12

[email

-0.11

Â£

-0.10

Ð½Ð°Ð´Ð»ÐµÐ¶

-0.09

âĪ

-0.09

_Lean

-0.09

 hopefully

-0.09

å¹³æĪĲ

-0.09

inclusive

-0.09

POSITIVE LOGITS

ä¾ĭå¦Ĥ

0.28

eg

0.25

 such

0.22

 Ð½Ð°Ð¿ÑĢÐ¸Ð¼ÐµÑĢ

0.22

 napÅĻ

0.22

eg

0.22

such

0.21

0.19

 ÐĿÐ°Ð¿ÑĢÐ¸Ð¼ÐµÑĢ

0.17

0.16

Activations Density 0.055%