INDEX

Explanations

is a / set of

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Rigidbody

-0.09

 familiar

-0.09

arend

-0.09

_mgr

-0.08

 quot

-0.08

ÂĢÂĢ

-0.08

_STARTED

-0.08

æĹıèĩªæ²»

-0.08

adesh

-0.08

 titul

-0.08

POSITIVE LOGITS

not

0.16

 hasn

0.13

 nicht

0.13

 khÃ´ng

0.12

 term

0.12

à¸±à¸ĩà¹Ħà¸¡

0.12

 à¤¨à¤¹

0.12

 rare

0.11

 chÆ°a

0.11

à¹Ħà¸¡

0.11

Activations Density 0.322%