INDEX

Explanations

angle, direction, orientation

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 disproportion

-0.10

 plaster

-0.10

agas

-0.09

 Distance

-0.09

uct

-0.09

 Shan

-0.09

 Wass

-0.09

 Lantern

-0.08

chio

-0.08

 Luther

-0.08

POSITIVE LOGITS

 polar

0.29

 Polar

0.24

 polarization

0.22

 filter

0.19

 Ð¿Ð¾Ð»Ñı

0.18

 filters

0.17

dep

0.16

 Filter

0.15

 Brew

0.15

åģı

0.15

Activations Density 0.016%