INDEX

Explanations

ingredient name or line

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ingly

-0.10

cot

-0.10

ArgumentException

-0.10

/preferences

-0.09

 stabbing

-0.09

ablo

-0.09

iche

-0.09

 DISCLAIM

-0.09

orca

-0.09

POSITIVE LOGITS

ial

0.16

ially

0.15

ials

0.14

icide

0.13

itial

0.11

es

0.11

za

0.11

IAL

0.11

ents

0.10

ments

0.09

Activations Density 0.027%