INDEX

Explanations

gentle slower steady preserve steady

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

raw

-0.11

 dynam

-0.11

 masculine

-0.10

 há»

-0.10

 subtly

-0.10

 exped

-0.09

Bold

-0.09

ç²Ĺ

-0.09

 máº¡nh

-0.09

 kinetic

-0.09

POSITIVE LOGITS

 slower

0.24

 slow

0.24

 gent

0.23

 gentle

0.20

gent

0.19

slow

0.18

 softer

0.16

æŁĶ

0.15

 steady

0.15

æħ¢

0.15

Activations Density 0.222%