INDEX

Explanations

positive abstract qualities

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 ãĢģ

-0.09

.Formatter

-0.09

ect

-0.09

uraa

-0.09

ouv

-0.08

-or

-0.08

ARIABLE

-0.08

Ø§ØŃØª

-0.08

¦æĥħ

-0.08

 -*-č\n

-0.08

POSITIVE LOGITS

":""

0.10

 Kron

0.09

orque

0.09

ÐµÐºÑģÐ¸

0.08

aptors

0.08

hores

0.08

enen

0.08

enton

0.07

uje

0.07

 Guerrero

0.07

Activations Density 0.123%