INDEX

Explanations

mutual respect and praise

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 loved

-0.11

 Loved

-0.11

 beloved

-0.10

atar

-0.10

 pending

-0.10

ynam

-0.10

 Fitzgerald

-0.09

equ

-0.09

 skeletons

-0.09

 seemingly

-0.09

POSITIVE LOGITS

 mutual

0.17

adm

0.14

 cord

0.14

äºĴ

0.12

 respect

0.12

 Mutual

0.12

mut

0.11

 colleg

0.11

 friendly

0.11

 respects

0.11

Activations Density 0.055%