INDEX

Explanations

possessive + action or quality

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

æĪĴ

-0.09

akk

-0.09

afil

-0.08

Smy

-0.08

 goodwill

-0.08

 blame

-0.08

 inher

-0.08

Fir

-0.08

 sincere

-0.08

svm

-0.08

POSITIVE LOGITS

 efforts

0.24

 contribution

0.22

 role

0.20

 contributions

0.19

 sake

0.19

 effort

0.17

 actions

0.17

 part

0.16

 help

0.15

 work

0.15

Activations Density 0.033%