INDEX

Explanations

drinking coffee and tea

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 dinner

-0.14

 Dinner

-0.14

 Cocktail

-0.12

pizza

-0.11

 dinners

-0.11

éħĴ

-0.11

Pizza

-0.11

 pizza

-0.11

 tavern

-0.11

beer

-0.11

POSITIVE LOGITS

lat

0.23

 coffee

0.22

 coff

0.21

 Coffee

0.19

cup

0.18

 Coff

0.17

 java

0.17

Coffee

0.17

 cups

0.17

 beans

0.16

Activations Density 0.050%