INDEX

Explanations

mental health conditions and personality

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 paranoid

-0.10

unp

-0.10

 productive

-0.09

 suicidal

-0.09

CLR

-0.09

 psychology

-0.09

 potentially

-0.09

 paranoia

-0.08

 Psychology

-0.08

 relations

-0.08

POSITIVE LOGITS

 perfection

0.16

 underlying

0.13

 personality

0.13

 lack

0.13

 upbringing

0.12

low

0.12

ogi

0.12

 histories

0.11

 Personality

0.11

 personalities

0.11

Activations Density 0.085%