INDEX

Explanations

assigning category or type

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 subdivisions

-0.10

ocabulary

-0.09

 wording

-0.09

 straw

-0.09

 vocabulary

-0.09

 Giuliani

-0.09

 Sylv

-0.09

 Straw

-0.08

 Birth

-0.08

éħ

-0.08

POSITIVE LOGITS

 category

0.27

 type

0.24

category

0.19

.category

0.18

 genre

0.17

 ÐºÐ°ÑĤÐµÐ³Ð¾ÑĢ

0.17

 class

0.17

type

0.16

ç±»åŀĭ

0.16

 nature

0.16

Activations Density 0.256%