INDEX

Explanations

human intent and preferences

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

function

0.53

es

0.52

0.49

ாட்டு

0.48

ט

0.47

ксана

0.46

 इनो

0.46

س

0.45

POSITIVE LOGITS

rakt

0.43

왠

0.41

做了

0.40

 decimated

0.40

批评

0.38

 ridiculed

0.38

 аргу

0.38

ことにより

0.38

一生

0.38

 말을

0.38

Activations Density 0.006%