INDEX

Explanations

studies showing effects and outcomes

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 hopefully

0.49

 Hopefully

0.47

Hopefully

0.46

 Allow

0.42

我们需要

0.41

 אר

0.41

 هنعمل

0.40

 хотим

0.40

 perlu

0.39

 Screenshot

0.38

POSITIVE LOGITS

 studies

0.87

Studies

0.81

 Studies

0.80

 research

0.79

 statistically

0.78

 empirically

0.77

研究

0.75

studies

0.75

 researchers

0.73

 penelitian

0.73

Activations Density 0.134%