INDEX

Explanations

packing and moving boxes

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ande

-0.11

 Heller

-0.11

 Bucket

-0.10

vecs

-0.09

andy

-0.09

 Innoc

-0.09

 rake

-0.09

rl

-0.09

 rehe

-0.09

 contr

-0.09

POSITIVE LOGITS

 fragile

0.16

 moving

0.14

Bubble

0.14

 Bubble

0.13

 Frag

0.13

 bubble

0.12

frag

0.12

Moving

0.12

 unpack

0.12

 Moving

0.11

Activations Density 0.016%