INDEX

Explanations

newer research suggests

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 caracteres

0.51

 actores

0.50

䒾

0.50

 nutrientes

0.49

 utilizó

0.48

 caracteriza

0.47

 trabajadores

0.47

 utiliza

0.47

に適

0.46

 acteurs

0.46

POSITIVE LOGITS

op

0.50

 transcendence

0.48

As

0.48

or

0.48

Remark

0.47

Cl

0.46

ub

0.46

⁣⁣

0.46

Trans

0.45

ability

0.45

Activations Density 0.003%