INDEX

Explanations

that can or must

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 unlawful

-0.10

 Ernst

-0.10

inati

-0.10

Pra

-0.10

 ReadOnly

-0.09

ilio

-0.09

mani

-0.09

ussy

-0.09

igon

-0.09

wh

-0.08

POSITIVE LOGITS

 allowed

0.19

allowed

0.17

Allowed

0.14

 permitted

0.14

 Allowed

0.13

 before

0.13

 maximum

0.12

 tolerated

0.11

 Maximum

0.11

åħģ

0.11

Activations Density 0.112%