INDEX

Explanations

covered in or with

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 tether

-0.09

 Benton

-0.09

.backup

-0.09

Partition

-0.09

ç©¿

-0.09

 earrings

-0.08

 Accord

-0.08

 serr

-0.08

 sandals

-0.08

POSITIVE LOGITS

 layer

0.20

 layers

0.19

 blanket

0.16

 blankets

0.15

å±Ĥ

0.14

layers

0.13

 Layers

0.13

 lá»Ľp

0.12

å±¤

0.12

layer

0.12

Activations Density 0.077%