INDEX

Explanations

who is perceived or different

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

inz

-0.10

ï¾Į

-0.10

ullet

-0.10

 spirit

-0.09

 estr

-0.09

Ã«l

-0.09

 abusive

-0.08

oken

-0.08

flen

-0.08

 harmful

-0.08

POSITIVE LOGITS

 perceived

0.20

 slightest

0.14

 appearance

0.12

 unconventional

0.12

 accents

0.12

 insufficient

0.11

 fail

0.10

upp

0.10

 perfectly

0.10

 Simply

0.10

Activations Density 0.142%