INDEX

Explanations

is not

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 slightly

-0.15

æľīçĤ¹

-0.13

 hÆ¡i

-0.12

 somewhat

-0.11

è¾ĥ

-0.10

 Ð½ÐµÐ¼Ð½Ð¾Ð³Ð¾

-0.10

 relatively

-0.09

 blatantly

-0.09

éĥ

-0.09

 vain

-0.09

POSITIVE LOGITS

 completely

0.22

 absolutely

0.19

 definitely

0.19

å®Įåħ¨

0.18

far

0.18

æł¹æľ¬

0.18

pletely

0.17

 entirely

0.16

 kesinlikle

0.16

 hoÃłn

0.15

Activations Density 0.160%