INDEX

Explanations

by or byz

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Ø±ÙĬØ§Ø¶

-0.13

stead

-0.10

çĶ±

-0.10

Ids

-0.10

stants

-0.10

 regard

-0.10

 Ð²Ð¸Ð´Ñĥ

-0.10

ahkan

-0.09

mit

-0.09

Idx

-0.09

POSITIVE LOGITS

gone

0.26

-election

0.23

-products

0.22

-pass

0.21

products

0.20

elor

0.18

laws

0.18

-product

0.18

product

0.18

ond

0.16

Activations Density 0.074%