INDEX

Explanations

explicit content requests

requests containing explicit sexual content or exploitative themes and hateful slurs—i.e., unsafe, policy-violating prompts.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 বিবেচনায়

0.48

 reporting

0.40

 forecast

0.38

↵

0.38

对

0.38

 Assessing

0.37

 assessment

0.37

 Reporting

0.37

 typically

0.37

From

0.37

POSITIVE LOGITS

 ainult

0.48

 HOWEVER

0.46

 Sachen

0.45

 terkenal

0.44

됬

0.44

 blatantly

0.43

жалуйста

0.43

 horrible

0.43

 recomiendo

0.43

 Pikachu

0.43

Activations Density 0.221%