INDEX

Explanations

avoiding protecting refusing

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

oje

0.55

 აღმასრულ

0.49

 Zuge

0.48

],$

0.48

 cementing

0.46

 enlarging

0.45

 rebellious

0.45

 Carlyle

0.45

 slimy

0.45

 rougeâtre

0.45

POSITIVE LOGITS

ก

0.51

都有

0.42

设

0.42

canvas

0.41

panel

0.41

说明

0.41

is

0.40

ру

0.40

 با

0.40

测试

0.40

Activations Density 0.003%