INDEX

Explanations

aka, breakdown, underrated

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Mitigation

0.35

负

0.34

呵呵

0.33

 Luigi

0.33

吔

0.33

ইতি

0.32

тную

0.32



0.32

 aran

0.31

 دیں۔

0.31

POSITIVE LOGITS

 donc

0.38

eum

0.38

 constante

0.37

 hence

0.36

Sá

0.36

 certaines

0.35

mag

0.35

 pronounce

0.35

 prevail

0.34

्रा

0.34

Activations Density 0.733%