INDEX

Explanations

a followed by descriptive words

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

sth

-0.10

igs

-0.10

kk

-0.10

ï¾ŀ

-0.10

íĥĦ

-0.09

 ÑģÐ±

-0.09

eros

-0.08

 stun

-0.08

ims

-0.08

centage

-0.08

POSITIVE LOGITS

bit

0.13

 dose

0.13

few

0.12

 heads

0.11

 chance

0.11

ird

0.11

ton

0.11

 taste

0.10

 helping

0.10

/an

0.10

Activations Density 0.115%