INDEX

Explanations

words starting with D

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ump

-0.17

aver

-0.17

emo

-0.16

addy

-0.15

ays

-0.14

ÑĢÑĥÐ³

-0.14

iff

-0.14

ir

-0.14

ems

-0.14

esc

-0.14

POSITIVE LOGITS

ey

0.14

acia

0.12

acie

0.11

ually

0.11

ocrat

0.10

AS

0.10

osit

0.10

atch

0.10

ang

0.10

hole

0.10

Activations Density 0.053%