INDEX

Explanations

safe discussion, responsible exploration

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

濃厚

0.52

 चाहे

0.47

してしまう

0.46

 impatient

0.43

 scandalous

0.43

 şidd

0.43

 rushed

0.42

 heady

0.42

⚡

0.41

؍

0.41

POSITIVE LOGITS

 safely

0.92

 harmless

0.86

 ONLY

0.80

 responsibly

0.80

 carefully

0.78

 cautiously

0.77

 bezpie

0.77

 tasteful

0.75

 gently

0.74

 осторо

0.73

Activations Density 0.355%