INDEX

Explanations

identifying a word's definition

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 phrases

-0.18

 phrase

-0.17

Phrase

-0.16

 abbrev

-0.16

 Phrase

-0.15

phrase

-0.15

Abb

-0.13

 abbreviation

-0.13

 acronym

-0.12

Abb

-0.12

POSITIVE LOGITS

 word

0.18

 Word

0.16

Mer

0.14

Word

0.14

 dictionary

0.13

 definition

0.13

 Oxford

0.13

.word

0.13

word

0.12

 Webster

0.12

Activations Density 0.194%