INDEX

Explanations

to obtain

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ANTE

-0.10

opal

-0.09

agne

-0.09

idis

-0.09

à¹ģà¸Ĺ

-0.09

 authorised

-0.09

 pressing

-0.09

 Confidential

-0.08

 accepted

-0.08

Dud

-0.08

POSITIVE LOGITS

 obtain

0.28

 obtaining

0.28

 obtained

0.27

Obt

0.24

 Obtain

0.24

 obten

0.23

obt

0.22

 obtains

0.21

åıĸå¾Ĺ

0.19

 obtener

0.19

Activations Density 0.064%