INDEX

Explanations

refusing inappropriate requests

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

1.07

、

0.94

 though

0.92

،

0.91

albeit

0.90

 serta

0.89

﹑

0.87

(),

0.87

 meant

0.86

 donc

0.86

POSITIVE LOGITS

 Whereas

1.01

Whereas

0.93

whereas

0.83

 Hvis

0.78

 whereas

0.74

 Meanwhile

0.70

ром

0.67

 exertions

0.65

자가

0.65

 Sedangkan

0.65

Activations Density 0.130%