INDEX

Explanations

lead to negative consequences

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 щоб

0.82

voor

0.77

力和

0.74

 mandato

0.72

upon

0.69

avaa

0.67

 pentru

0.66

':[

0.66

 gegen

0.65

для

0.65

POSITIVE LOGITS

 nowhere

0.90

 anywhere

0.77

 astray

0.73

డ్డు

0.72

处

0.70

 productive

0.69

 reproducing

0.68

 المخت

0.68

 कहीं

0.68

 stagnation

0.68

Activations Density 0.073%