INDEX

Explanations

which allow or enable

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

nell

-0.10

 necess

-0.09

 dopad

-0.09

ddd

-0.08

 whose

-0.08

 imposs

-0.08

icos

-0.08

stell

-0.08

pond

-0.08

POSITIVE LOGITS

 allow

0.31

 allows

0.30

allow

0.29

 allowing

0.28

 enable

0.28

 enables

0.27

 Allows

0.25

åħģ

0.25

allows

0.24

enable

0.23

Activations Density 0.128%