INDEX

Explanations

purposeful verbs

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 kennenlernen

-0.14

 anzeigen

-0.12

 bekan

-0.10

 VerfÃ¼g

-0.10

 weiber

-0.10

 erfahren

-0.10

 Ekon

-0.10

lernen

-0.09

 mÃ¶chten

-0.09

 anmeld

-0.09

POSITIVE LOGITS

vor

0.11

ober

0.11

und

0.11

get

0.11

war

0.11

 arch

0.10

we

0.10

 Prot

0.10

ver

0.10

iling

0.10

Activations Density 0.027%