INDEX

Explanations

awe, amazement, fascination, admiration

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 sukh

0.77

 <!--<

0.72

 unpleasant

0.67

 undesirable

0.64

 nonatomic

0.63

🙅

0.62

ńskiej

0.62

 Redox

0.62

 suicide

0.62

ໂ

0.61

POSITIVE LOGITS

awe

2.46

 amazement

2.07

 admiration

1.99

 wonder

1.94

 amazed

1.89

 marvel

1.88

aw

1.84

 fascination

1.76

 fascinated

1.72

 astonishment

1.65

Activations Density 0.222%