INDEX

Explanations

hunger and appetite

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Morrow

-0.10

Â°}

-0.09

Vul

-0.09

 Pruitt

-0.09

eff

-0.09

 pulp

-0.08

 drinking

-0.08

 Variant

-0.08

/|

-0.08

POSITIVE LOGITS

 hunger

0.47

 Hunger

0.44

 hungry

0.42

 Ð³Ð¾Ð»Ð¾Ð´

0.39

 Hung

0.37

Hung

0.36

hung

0.35

 appetite

0.32

 stomach

0.28

unger

0.27

Activations Density 0.092%