INDEX

Explanations

specific closing words

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Act

0.90

Methods

0.90

Path

0.83

age

0.82

methods

0.82

І

0.81

only

0.81

As

0.80

ess

0.80

и

0.79

POSITIVE LOGITS

 LinkedList

0.93

 audacious

0.88

 cappuccino

0.87

 guava

0.84

 furry

0.81

cel

0.80

 rebranding

0.80

 hairy

0.80

 monstrous

0.80

 multitude

0.79

Activations Density 0.001%