INDEX

Explanations

justice, injustices, judiciary

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

itzer

-0.12

sWith

-0.09

 culpa

-0.08

sm

-0.08

 scratch

-0.08

/stdc

-0.08

thic

-0.08

ÑħÐ¾Ð²Ð¸

-0.08

pra

-0.08

bis

-0.08

POSITIVE LOGITS

ifiable

0.16

ous

0.14

ifi

0.14

icial

0.12

ices

0.12

/right

0.12

ously

0.12

iciary

0.12

iros

0.12

 distrib

0.11

Activations Density 0.021%