INDEX

Explanations

son of parents or ancestors

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 unborn

-0.10

erate

-0.10

YPE

-0.09

ä¸Ģç§į

-0.09

 student

-0.08

 Ð½ÐµÐ¿ÑĢÐ¸

-0.08

onic

-0.08

 å£

-0.08

YP

-0.08

ype

-0.08

POSITIVE LOGITS

 parents

0.18

Parents

0.14

 Parents

0.14

parents

0.13

 bitch

0.11

 immigrants

0.11

ither

0.10

 abusive

0.10

 union

0.10

 minor

0.09

Activations Density 0.074%