INDEX

Explanations

good or bad descriptions

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

joint

-0.09

Pek

-0.09

 ener

-0.09

artisan

-0.09

 Caller

-0.09

erty

-0.08

aison

-0.08

 Neville

-0.08

 Cousins

-0.08

uct

-0.08

POSITIVE LOGITS

 person

0.21

 citizen

0.18

 citizens

0.15

 listener

0.14

 human

0.14

 Person

0.13

 listeners

0.13

 friend

0.13

cit

0.12

 daughter

0.12

Activations Density 0.079%