INDEX

Explanations

enjoyment of simple things

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ream

-0.09

rint

-0.09

gaz

-0.09

_exempt

-0.09

 naughty

-0.09

 sanit

-0.09

 deep

-0.09

ActionButton

-0.08

ÃĹ\n\n

-0.08

 clin

-0.08

POSITIVE LOGITS

 enjoyment

0.11

TBD

0.10

 simple

0.10

äº«

0.09

 mysterious

0.09

 interesting

0.09

unya

0.09

 interested

0.09

 acknow

0.08

 Lambert

0.08

Activations Density 0.093%