INDEX

Explanations

here is, following code

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Rena

-0.09

aint

-0.09

EMS

-0.09

lys

-0.08

EMS

-0.08

aches

-0.08

yon

-0.08

å¯¸

-0.08

 shemale

-0.08

nen

-0.08

POSITIVE LOGITS

¡´

0.10

 illum

0.09

 Richt

0.09

_configure

0.09

cap

0.08

 tack

0.08

cop

0.08

erdem

0.08

Ł¥

0.08

elib

0.08

Activations Density 0.004%