INDEX

Explanations

safety and security protocols

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

进

0.45

0.43

确

0.41

 тыс

0.41

]，

0.41

树

0.40

punyai

0.40

icions

0.40

uine

0.40

POSITIVE LOGITS

 safety

0.54

 segurança

0.52

 sicurezza

0.51

āo

0.49

 bezpiecze

0.48

0.46

 sécurité

0.46

 Safety

0.45

FAO

0.45

 безопасность

0.45

Activations Density 0.008%