INDEX

Explanations

the beginning of definitive statements

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Disappear

-0.10

-0.09

-0.08

Naj

-0.08

ittest

-0.08

liest

-0.08

Ã´t

-0.08

 EntryPoint

-0.08

 renamed

-0.08

uele

-0.08

POSITIVE LOGITS

 answer

0.21

oret

0.19

orem

0.18

 question

0.17

 short

0.17

ories

0.16

answer

0.16

 term

0.16

irs

0.16

atre

0.15

Activations Density 0.147%