INDEX

Explanations

a sign, individuals, white

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Ukraj

-0.10

 clashes

-0.10

pdo

-0.09

'gc

-0.09

å·¡

-0.09

owell

-0.09

ipel

-0.09

ucci

-0.08

 searcher

-0.08

POSITIVE LOGITS

 prim

0.13

 height

0.12

 experiment

0.12

-height

0.11

men

0.11

 male

0.11

height

0.11

 Height

0.11

 heartbeat

0.11

æģĲ

0.10

Activations Density 0.040%