INDEX

Explanations

appreciate and variations

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

gne

-0.11

inality

-0.10

eter

-0.09

Ã¸re

-0.09

ething

-0.08

aders

-0.08

ish

-0.08

hots

-0.08

libs

-0.08

_SOCKET

-0.08

POSITIVE LOGITS

iable

0.15

iative

0.14

iation

0.12

iate

0.12

ably

0.12

/respond

0.12

iations

0.11

iating

0.11

 appreciate

0.11

ments

0.10

Activations Density 0.016%