INDEX

Explanations

especially if or because

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 afin

-0.13

aby

-0.10

 ÑĩÑĤÐ¾Ð±Ñĭ

-0.10

æīįèĥ½

-0.09

 ÑīÐ¾Ð±

-0.09

cela

-0.09

 chá»©

-0.09

 rather

-0.09

 deÄŁil

-0.09

askell

-0.08

POSITIVE LOGITS

 especially

0.23

 because

0.21

 compared

0.21

especially

0.18

 Especially

0.17

because

0.16

åĽłä¸º

0.15

 Ð¾ÑģÐ¾Ð±ÐµÐ½Ð½Ð¾

0.15

 Because

0.15

ï¼ĮåĽłä¸º

0.14

Activations Density 0.081%