INDEX

Explanations

contraction for 'is' or 'was'

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

',

0.42

怆

0.39

ldef

0.39

uka

0.38

"',

0.38

axon

0.38

ênd

0.37

 lleve

0.37

屹

0.37

POSITIVE LOGITS

 asleep

0.63

 complaining

0.59

 jealous

0.57

 snoring

0.56

 impatient

0.55

 homophobic

0.54

 unwell

0.52

 allergic

0.51

 schizophren

0.51

 rumoured

0.51

Activations Density 0.008%