INDEX

Explanations

probability of dying

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

çĬ¯

-0.09

 destroyer

-0.09

 Severity

-0.09

 massacre

-0.08

ContentSize

-0.08

Neb

-0.08

 láº¡c

-0.08

 massac

-0.08

ovice

-0.08

POSITIVE LOGITS

 death

0.36

 deaths

0.31

 mortality

0.31

æŃ»äº¡

0.27

death

0.26

 Mort

0.25

 Death

0.25

æŃ»

0.23

ortality

0.22

 Deaths

0.22

Activations Density 0.076%