INDEX

Explanations

string startswith or delete

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 "-";\n

-0.12

 '-';\n

-0.12

('.');\n

-0.12

('_',

-0.11

"[%

-0.10

 '/');\n

-0.10

('/');\n

-0.09

 proverb

-0.09

GI

-0.08

inx

-0.08

POSITIVE LOGITS

("

0.12

0.11

`"

0.11

_("

0.11

azzi

0.10

str

0.10

"-

0.09

Ð¾ÑĤÑĮ

0.09

anda

0.09

azo

0.09

Activations Density 0.065%