INDEX

Explanations

Will followed by names or phrases

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

maybe

0.41

in

0.39

 실제

0.39

 можливо

0.38

可能有

0.38

 Throwable

0.37

//

0.37

sensitivity

0.36

 sensitivity

0.36

可能是

0.36

POSITIVE LOGITS

 gladly

0.63

 NEVER

0.53

 never

0.52

IAMS

0.50

 niemals

0.48

 jamás

0.46

amette

0.46

 WILL

0.46

 Never

0.45

 abide

0.45

Activations Density 0.003%