INDEX

Explanations

her way

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 drowning

-0.11

 Tong

-0.09

ãĥĪãĥª

-0.09

 mastur

-0.09

 sclerosis

-0.09

 cÆ°á»Ŀi

-0.09

ASTER

-0.09

 Sachs

-0.09

usercontent

-0.08

 tong

-0.08

POSITIVE LOGITS

rag

0.11

 improvis

0.11

 ration

0.11

jag

0.11

 survival

0.10

lim

0.10

 irregular

0.10

 Orient

0.10

collapsed

0.10

 Alley

0.10

Activations Density 0.052%