INDEX

Explanations

harmful, normalize, contribute

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 środ

0.42

 فیصلے

0.38

vü

0.37

 Geist

0.35

 licha

0.35

 judiciaire

0.35

 तु

0.34

\;

0.34

েন্টের

0.34

寸

0.34

POSITIVE LOGITS

 harm

1.24

 Harm

1.10

 harms

1.07

harm

1.04

 harmed

1.01

Harm

0.97

 harmful

0.94

harmed

0.94

 damage

0.86

damage

0.81

Activations Density 0.459%