INDEX

Explanations

inappropriate, hast, nutrition, code, expected, pork

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 however

-0.14

indow

-0.12

MBER

-0.10

 nonetheless

-0.10

 despite

-0.10

oma

-0.10

 trotz

-0.09

 æĻ´

-0.09

aled

-0.09

Anywhere

-0.09

POSITIVE LOGITS

 anyway

0.56

Anyway

0.54

 Anyway

0.47

 anyways

0.30

 zaten

0.22

 stejnÄĽ

0.20

 ÑĢÐ°Ð²Ð½Ð¾

0.17

 toch

0.17

 jeden

0.15

åĺĽ

0.14

Activations Density 0.105%