INDEX

Explanations

subject + auxiliary verb

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 youre

-0.09

 Hale

-0.08

anton

-0.08

itler

-0.08

lixir

-0.08

ãģ®ãģĮ

-0.08

uien

-0.07

ä¹Łæĺ¯

-0.07

 counterpart

-0.07

ä¹Łä¸į

-0.07

POSITIVE LOGITS

has

0.23

 have

0.23

 telah

0.23

 ÄĳÃ£

0.21

 heeft

0.17

_have

0.17

 Have

0.17

 hath

0.17

had

0.16

've

0.16

Activations Density 0.065%