INDEX

Explanations

boolean evaluation manner

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Beng

-0.09

ro

-0.09

umer

-0.09

 Fran

-0.09

 Barg

-0.08

.argument

-0.08

 Bened

-0.08

.Itoa

-0.08

 Babylon

-0.08

 Lifecycle

-0.08

POSITIVE LOGITS

 truth

0.28

 Truth

0.24

 bool

0.24

 boolean

0.24

 Boolean

0.24

Truth

0.23

Boolean

0.22

 True

0.21

bool

0.20

 Bool

0.20

Activations Density 0.069%