INDEX

Explanations

standing by no matter what

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 collaps

-0.09

Ã¡tka

-0.09

 unre

-0.09

ursed

-0.08

 horn

-0.08

 handed

-0.08

akat

-0.08

alem

-0.08

SEG

-0.08

 uncert

-0.08

POSITIVE LOGITS

 loyalty

0.25

loy

0.23

 loyal

0.22

Loy

0.22

LOY

0.18

 backs

0.15

 stick

0.15

 support

0.15

 Stick

0.15

 defend

0.14

Activations Density 0.050%