INDEX

Explanations

eating disorders and extended fasts

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Jarvis

-0.09

Mez

-0.09

 rand

-0.09

 Fate

-0.09

 Hubb

-0.09

CO

-0.09

geb

-0.09

 sclerosis

-0.09

MSR

-0.08

 epile

-0.08

POSITIVE LOGITS

bul

0.18

 eating

0.17

 Eating

0.16

EDA

0.16

ED

0.15

 Body

0.15

 purge

0.15

 body

0.15

/body

0.15

ED

0.14

Activations Density 0.035%