INDEX

Explanations

series of positive concepts

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ording

-0.09

andon

-0.09

irit

-0.09

 consort

-0.08

oler

-0.08

 goodness

-0.08

 Pike

-0.08

ìį¨

-0.08

.Transactional

-0.08

ÙĩÙĨ

-0.08

POSITIVE LOGITS

/or

0.18

/OR

0.11

 mutual

0.10

olis

0.10

gum

0.10

 collaboration

0.09

Kor

0.09

 proper

0.09

 prag

0.08

asma

0.08

Activations Density 0.119%