INDEX

Explanations

not trained on different

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

.sav

-0.11

xea

-0.09

 unchanged

-0.09

 reput

-0.09

tsy

-0.09

 unconventional

-0.09

ipa

-0.09

 alike

-0.09

olie

-0.09

 unusual

-0.08

POSITIVE LOGITS

 separate

0.32

 distinct

0.32

 independent

0.29

çĭ¬ç«ĭ

0.27

 Separate

0.26

 riÃªng

0.25

distinct

0.23

 independ

0.23

 Ð¾ÑĤÐ´ÐµÐ»ÑĮ

0.22

 independently

0.21

Activations Density 0.127%