INDEX

Explanations

concepts and technical domains

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-0.12

raf

-0.11

 surrounding

-0.10

gh

-0.09

elf

-0.09

illo

-0.09

 Vend

-0.09

mem

-0.09

POSITIVE LOGITS

 Ú©Ùĩ

0.13

 that

0.12

 ÏĢÎ¿Ïħ

0.11

 ÐºÐ¾ÑĤÐ¾ÑĢÑĥÑİ

0.11

 kterou

0.11

 hogy

0.11

 which

0.11

 mÃł

0.10

 ettÃ¤

0.10

Ã¡zev

0.10

Activations Density 0.262%