INDEX

Explanations

prompt response and action

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

rien

-0.09

 continued

-0.09

ongo

-0.09

defer

-0.08

ritz

-0.08

sid

-0.08

amik

-0.08

unt

-0.08

/TR

-0.08

 impatient

-0.08

POSITIVE LOGITS

 acted

0.33

act

0.29

 quick

0.28

 action

0.27

 acting

0.26

 Ð´ÐµÐ¹ÑģÑĤÐ²

0.24

åıįåºĶ

0.22

 reaction

0.22

 react

0.21

quick

0.21

Activations Density 0.087%