INDEX

Explanations

harming or hurting others

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

us

0.60

0.59

0.57

era

0.55

ן

0.55

in

0.54

se

0.54

ass

0.53

uja

0.53

POSITIVE LOGITS

⤟

0.48

 outweighed

0.48

<unused37>

0.47

 dacă

0.45

 डिमांड

0.44

 densément

0.44

 கொள்

0.44

 chlorinated

0.44

 nitride

0.44

 değildir

0.44

Activations Density 0.002%