INDEX

Explanations

supporting evidence and sources

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ool

-0.10

aden

-0.09

ASN

-0.09

pic

-0.09

wis

-0.08

 ç¿

-0.08

 rout

-0.08

ovich

-0.08

:;\n

-0.08

/g

-0.08

POSITIVE LOGITS

 supporting

0.33

 support

0.31

 backing

0.29

æĶ¯æĮģ

0.29

 supports

0.28

support

0.28

Support

0.25

 Ð¿Ð¾Ð´Ð´ÐµÑĢÐ¶

0.25

 há»Ĺ

0.24

 Supporting

0.24

Activations Density 0.107%