INDEX

Explanations

distributed or shared

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 centrally

-0.10

 irres

-0.09

ipl

-0.09

 disgr

-0.09

Fah

-0.08

edom

-0.08

ÐºÐ¾Ð²Ð¾Ð´

-0.08

 Dive

-0.08

umping

-0.08

POSITIVE LOGITS

 spread

0.62

 Spread

0.50

spread

0.49

åĪĨå¸ĥ

0.46

Spread

0.46

 distributed

0.46

 distribute

0.42

æķ£

0.41

 distribution

0.40

 distrib

0.39

Activations Density 0.150%