INDEX

Explanations

hesitation and negation

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 styleUrls

0.46

 innervation

0.44

 உதவு

0.42

诚

0.41

젬

0.41

未

0.40

 interessant

0.40

 دارید

0.39

 znajdują

0.39

ㄩ

0.39

POSITIVE LOGITS

 hesitate

0.89

 hesitation

0.84

 doubt

0.82

shy

0.81

 hesitated

0.80

 hés

0.75

 underestimate

0.73

 deterred

0.69

 doubted

0.67

Hes

0.67

Activations Density 0.029%