INDEX

Explanations

observation repeats N times

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

++++++++++++++++

0.38

wra

0.37

SP

0.35

]/

0.34

 watershed

0.34

CIRCLE

0.33

 sund

0.33

 zach

0.33

 continua

0.33

//}

0.33

POSITIVE LOGITS

Cbd

0.51

 Reveals

0.50

 Secret

0.48

 Utilizing

0.47

 magnificence

0.46

 Secrets

0.46

 Superstar

0.46

 Concerning

0.45

 ТО

0.45

 Regarding

0.44

Activations Density 0.001%