INDEX

Explanations

a/an positive adjectives

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 surprise

-0.09

adel

-0.09

HEME

-0.08

oment

-0.08

engin

-0.08

 usur

-0.08

elez

-0.08

æľĭ

-0.08

ruc

-0.08

ÑģÐ¾Ðº

-0.08

POSITIVE LOGITS

 great

0.27

 excellent

0.24

 good

0.23

 nice

0.19

 effective

0.18

 perfect

0.18

 popular

0.17

 ideal

0.16

great

0.16

fun

0.15

Activations Density 0.065%