INDEX

Explanations

documentation/annotation markers

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

0.40

 reform

0.39

0.38

 harm

0.37

 expansion

0.37

 health

0.37

row

0.36

 softly

0.36

 grow

0.36

POSITIVE LOGITS

----------------

0.79

 NOTE

0.61

================

0.60

****************

0.57

 ---------------

0.55

---------------

0.54

 Certaines

0.54

 Viele

0.52

 Please

0.51

 WARNING

0.51

Activations Density 0.004%