INDEX

Explanations

a sense of

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 addCriterion

-0.11

//{{

-0.11

ÂĢÂĢ

-0.10

ÐµÐ·ÑĥÐ»ÑĮÑĤ

-0.09

.metro

-0.09

Verdana

-0.09

<|begin_of_text|>

-0.09

EMPLARY

-0.09

 kaldÄ±r

-0.09

 kurtar

-0.08

POSITIVE LOGITS

0.09

_DEPRECATED

0.08

::

0.08

_/

0.07

 regardless

0.07

-ok

0.07

@@

0.07

 emerg

0.07

Activations Density 0.172%