INDEX

Explanations

assistant : toxic : assign : 202

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Lou

-0.09

tick

-0.08

Ler

-0.08

iker

-0.08

icker

-0.08

/an

-0.08

onta

-0.08

 knee

-0.08

 Vere

-0.08

agens

-0.07

POSITIVE LOGITS

 ç·

0.11

dek

0.09

<|end_header_id|>

0.09

asaki

0.09

ï¼¿_

0.09

liest

0.08

ship

0.08

.Formatter

0.08

xlim

0.08

ÚĨÙĩ

0.08

Activations Density 0.000%