INDEX

Explanations

if you want or need

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 entr

-0.09

inson

-0.09

 Hlav

-0.09

 probably

-0.09

aha

-0.09

 Haut

-0.09

Kan

-0.09

lik

-0.09

mund

-0.09

elif

-0.08

POSITIVE LOGITS

 absolutely

0.26

 Absolutely

0.20

Absolutely

0.19

 still

0.19

 must

0.16

 Still

0.16

still

0.16

 absolute

0.15

ä»į

0.15

è¿ĺæĺ¯

0.15

Activations Density 0.048%