INDEX

Explanations

refusing inappropriate requests due to safety

explanatory follow-up sentences that refer back to the previous idea with a demonstrative lead-in (e.g., “This …”) to clarify, emphasize, or summarize it.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

]]);

0.38

لبية

0.38

ندر

0.37

قطه

0.37

 blevet

0.36

 更新

0.36

ފ

0.34

更新

0.34

 তবুও

0.34

YPE

0.34

POSITIVE LOGITS

为此

0.75

そのため

0.67

 त्यासाठी

0.65

 Dafür

0.63

 requires

0.57

requires

0.57

 therefore

0.56

therefore

0.55

 deshalb

0.55

 Requires

0.55

Activations Density 0.292%