INDEX

Explanations

reflecting on experiences

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

AAF

-0.11

liner

-0.11

aret

-0.10

ughter

-0.10

ummer

-0.10

ocha

-0.09

upt

-0.09

liness

-0.09

ulin

-0.09

emale

-0.09

POSITIVE LOGITS

ively

0.21

ive

0.21

ivity

0.19

ors

0.18

.DeepEqual

0.17

.TypeOf

0.13

iveness

0.13

 poorly

0.12

IVE

0.12

ives

0.11

Activations Density 0.017%