INDEX

Explanations

for all ages

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 crib

-0.09

oppers

-0.09

Ã¤d

-0.09

ssue

-0.09

 masculine

-0.09

luv

-0.08

/filepath

-0.08

 getpid

-0.08

airie

-0.08

POSITIVE LOGITS

 audiences

0.19

 consumption

0.15

PG

0.14

 younger

0.13

 faint

0.13

 audience

0.12

 minors

0.12

 Consumption

0.12

 ages

0.12

pg

0.12

Activations Density 0.062%