INDEX

Explanations

sexually suggestive content

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 consecuencia

0.42

 মূ

0.39

Medit

0.38

阠

0.38

ισμό

0.38

σό

0.38

!-

0.36

畈

0.36

 segundos

0.36

مول

0.36

POSITIVE LOGITS

 implying

0.49

 implied

0.42

 adult

0.42

 suggesting

0.42

 imply

0.41

 suggests

0.39

 suggest

0.38

 especially

0.38

 nine

0.38

rob

0.37

Activations Density 0.018%