INDEX

Explanations

a followed by adjective

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

izm

-0.11

-0.09

icals

-0.09

RIES

-0.09

 latent

-0.09

 backbone

-0.09

ìĤ

-0.09

 Seks

-0.09

 heartbreaking

-0.09

 Bernstein

-0.09

POSITIVE LOGITS

 relief

0.13

 matter

0.13

 struggle

0.12

 turning

0.11

 sight

0.11

 exagger

0.11

 isol

0.11

 feeling

0.10

 moment

0.10

 toss

0.10

Activations Density 0.039%