INDEX

Explanations

that followed by pronoun

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 incurring

1.83

 robbing

1.81

 теат

1.79

 neither

1.77

吝

1.76

 raising

1.75

𝖑

1.68

 дина

1.66

 draining

1.66

 suicidal

1.66

POSITIVE LOGITS

 marito

1.62

1.55

verifier

1.52

 Verdict

1.52

vad

1.49

 voie

1.48

vej

1.46

 lengan

1.42

leads

1.41

 écran

1.41

Activations Density 0.002%