INDEX

Explanations

hobby and hobbies

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

al

-0.10

Fog

-0.09

sto

-0.09

unes

-0.09

er

-0.09

+m

-0.08

 bathing

-0.08

 boring

-0.08

 prom

-0.08

POSITIVE LOGITS

 Hobby

0.16

horse

0.16

 interests

0.16

 hobby

0.14

 hobbies

0.14

 activities

0.14

obbies

0.13

è¶£

0.13

 outside

0.13

activities

0.12

Activations Density 0.052%