INDEX

Explanations

give specific output

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ë¡ľëĤĺ

-0.09

ê´Ģë¦¬ìŀĲ

-0.08

à¸ģà¸§

-0.08

EMPLARY

-0.08

<Props

-0.08

ziej

-0.08

ekk

-0.08

ï¼ı:

-0.08

bum

-0.08

aterno

-0.08

POSITIVE LOGITS

 based

0.10

bas

0.10

based

0.10

 dá»±a

0.09

 reply

0.09

 Send

0.09

åŁºäºİ

0.08

'value

0.08

 send

0.08

 according

0.08

Activations Density 0.005%