INDEX

Explanations

to reproduce

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

å´

-0.09

birthdate

-0.09

ickness

-0.09

 pastoral

-0.08

ahun

-0.08

 ante

-0.08

 Jake

-0.08

 Carey

-0.08

 kale

-0.08

FAQ

-0.08

POSITIVE LOGITS

 poll

0.30

 Poll

0.26

poll

0.26

 pollen

0.25

Poll

0.20

.poll

0.17

 polls

0.16

 POLL

0.16

 polling

0.16

 sexual

0.15

Activations Density 0.039%