INDEX

Explanations

need for help or action

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

utters

-0.10

oni

-0.10

etes

-0.10

ETY

-0.09

estre

-0.09

fty

-0.09

asin

-0.09

_NAMESPACE

-0.09

emma

-0.09

adia

-0.09

POSITIVE LOGITS

 help

0.24

lessly

0.22

/w

0.22

 assistance

0.20

 Help

0.16

help

0.16

 Assistance

0.14

/W

0.14

(ed

0.13

Activations Density 0.038%