INDEX

Explanations

identifying reasons for

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Ø¹Ø§

-0.10

 kayn

-0.09

.ToBoolean

-0.09

_SRV

-0.09

TZ

-0.08

 kolo

-0.08

 forc

-0.08

 Ø§ÙĦÙħÙĨØª

-0.08

ads

-0.08

ledi

-0.08

POSITIVE LOGITS

why

0.24

why

0.20

Why

0.16

ä¸ºä»Ģä¹Ī

0.16

 existence

0.14

Why

0.14

 pourquoi

0.13

 Exist

0.12

 visit

0.12

WHY

0.11

Activations Density 0.083%