INDEX

Explanations

lists including or whether

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ÑıÐ²Ð¸

-0.09

 zwar

-0.09

ronic

-0.09

ahoma

-0.09

Ð°ÑĢÑĮ

-0.08

conto

-0.08

ocaly

-0.08

reminder

-0.08

 Towers

-0.08

POSITIVE LOGITS

 qualities

0.12

whether

0.11

 whether

0.11

 something

0.10

aren

0.10

uentes

0.09

 sniff

0.09

NullOr

0.08

ass

0.08

rah

0.08

Activations Density 0.030%