INDEX

Explanations

the followed by a concept

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

æľĢ

-0.10

macen

-0.09

ÑĤÐ¾Ð½

-0.09

 æľĢ

-0.09

osyal

-0.09

ize

-0.08

"group

-0.08

ÐµÐºÐ°ÑĢ

-0.08

HONE

-0.08

ï¼ĮæľĢ

-0.08

POSITIVE LOGITS

 same

0.14

çļĦä¸Ģä¸ª

0.12

 equivalent

0.12

oric

0.11

 wrong

0.11

ather

0.10

ws

0.10

 beginnings

0.10

 remains

0.10

wrong

0.10

Activations Density 0.304%