INDEX

Explanations

health, safety, and social categories

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 bijective

0.28

 vanilla

0.26

癹

0.26

bei

0.26

กับ

0.26

 sinned

0.26

های

0.25

 phenomenal

0.25

க்கொண்டு

0.25

in

0.25

POSITIVE LOGITS

 метою

0.27

 書い

0.25

&/

0.24

-/

0.23

VOA

0.23

↵↵

0.22

 लाह

0.22

 гром

0.22

 пора

0.22

Activations Density 1.623%