INDEX

Explanations

dangerous, weapons, death, or financial contexts

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 synced

0.39

bmod

0.39

optimal

0.39

 incompar

0.38

max

0.38

 devoid

0.38



0.37

新增

0.37

 habitually

0.37

が存在

0.36

POSITIVE LOGITS

The

0.54

 Women

0.49

 Medical

0.47

 National

0.46

 Rainbow

0.46

 Financial

0.45

0.44

 Justice

0.44

the

0.43

 Prince

0.43

Activations Density 0.000%