INDEX

Explanations

self- or high- prefix

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ientos

-0.09

 Damen

-0.09

.esp

-0.09

tej

-0.09

rapper

-0.08

forces

-0.08

 rigor

-0.08

enerator

-0.08

orld

-0.08

 convers

-0.08

POSITIVE LOGITS

etre

0.11

 è¾

0.09

sie

0.08

åĬłå·¥

0.08

Bew

0.08

antal

0.08

samp

0.08

ug

0.08

 Raum

0.08

tems

0.08

Activations Density 0.108%