INDEX

Explanations

abbreviations followed by definitions

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ë

-0.09

 Minds

-0.08

itra

-0.08

è¨³

-0.08

 aggress

-0.08

 æĽ°

-0.08

 Cater

-0.08

swire

-0.07

spath

-0.07

 ï½Ģ

-0.07

POSITIVE LOGITS

 itself

0.16

 proper

0.15

ä½ľä¸º

0.15

 referring

0.13

as

0.13

 sebagai

0.11

als

0.11

 refers

0.11

 meaning

0.10

 jako

0.10

Activations Density 0.213%