INDEX

Explanations

pas followed by os or French negation

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

eh

-0.10

uff

-0.10

tras

-0.10

pine

-0.10

hem

-0.10

eted

-0.09

helm

-0.09

ÂŃi

-0.09

tring

-0.09

pector

-0.09

POSITIVE LOGITS

sthrough

0.19

adena

0.15

ible

0.13

ively

0.12

ionate

0.12

sth

0.12

SED

0.12

engers

0.11

enger

0.11

ION

0.11

Activations Density 0.015%