INDEX

Explanations

anti- followed by words

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ous

-0.10

outers

-0.10

anko

-0.09

/dr

-0.09

LY

-0.09

esh

-0.09

curity

-0.09

ÑĩÐµÑģÐºÐ¾Ðµ

-0.09

manent

-0.09

 ãĦ

-0.08

POSITIVE LOGITS

à¸Ĺà¸²à¸Ļ

0.16

ForgeryToken

0.14

uated

0.13

aging

0.13

Gravity

0.12

icrobial

0.12

heroes

0.12

 gravity

0.12

 aging

0.12

ipated

0.11

Activations Density 0.015%