INDEX

Explanations

purposeful actions after "to"

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

EMPLARY

-0.10

oze

-0.09

Latch

-0.09

stead

-0.09

umbs

-0.09

oret

-0.09

thon

-0.08

Å¾el

-0.08

eca

-0.08

oders

-0.08

POSITIVE LOGITS

 help

0.16

-date

0.16

ting

0.15

 rival

0.15

iling

0.15

 enable

0.14

aid

0.14

asting

0.13

tes

0.13

 boot

0.13

Activations Density 0.095%