INDEX

Explanations

present significant risks

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

֡

0.43

 دی۔

0.40

storey

0.39

 வித்திய

0.38

発見

0.38

 nisid

0.38

噉

0.38

起了

0.37

 Premi

0.37

olecules

0.37

POSITIVE LOGITS

 છીએ

0.40

 safety

0.39

憾

0.38

 escap

0.37

athom

0.37

 biztons

0.37

 press

0.36

 button

0.36

 دہ

0.36

Unh

0.36

Activations Density 0.008%