INDEX

Explanations

introducing lists with including or such

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

erg

-0.09

ÑĢÐ°Ð²Ð¸

-0.09

-0.08

 Bever

-0.08

erg

-0.08

Wah

-0.08

 coer

-0.08

bable

-0.08

enger

-0.08

POSITIVE LOGITS

 including

0.25

including

0.20

 include

0.18

åĮħæĭ¬

0.18

 such

0.15

 includ

0.15

 Ð²ÐºÐ»ÑİÑĩ

0.15

 Including

0.15

 includes

0.15

 inclu

0.14

Activations Density 0.044%