INDEX

Explanations

avoiding excess and negative feelings

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

nonzero

0.44

รวจ

0.43

 incompar

0.41

"]},{"

0.40

훨

0.39

ដ

0.38

 করিয়াছিলেন

0.37

훨

0.36

Smiling

0.36

 দিয়া

0.36

POSITIVE LOGITS

 excessive

1.05

 Excessive

0.98

Excess

0.94

excess

0.93

 eccess

0.88

 exces

0.86

 overkill

0.86

 excess

0.84

 Excess

0.84

 overly

0.80

Activations Density 0.135%