INDEX

Explanations

expressing willingness to help or know

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

zur

-0.12

what

-0.09

the

-0.09

 dazu

-0.09

ral

-0.09

dat

-0.09

rale

-0.09

isse

-0.09

igen

-0.09

POSITIVE LOGITS

 assistance

0.24

 help

0.23

 Assistance

0.19

 talk

0.16

me

0.13

 Hilfe

0.13

 ayuda

0.13

 discuss

0.12

 follando

0.12

 Help

0.12

Activations Density 0.018%