INDEX

Explanations

modal verbs followed by potential actions

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ANDLE

-0.09

uits

-0.09

olo

-0.09

otic

-0.09

ERENCE

-0.09

innie

-0.09

yles

-0.08

 Nicholson

-0.08

uada

-0.08

anned

-0.08

POSITIVE LOGITS

 might

0.19

 will

0.18

 likely

0.17

 sáº½

0.15

might

0.14

ä¸Ģå®ļ

0.14

 appreciate

0.14

 would

0.14

 enjoy

0.14

 enjoyed

0.13

Activations Density 0.077%