INDEX

Explanations

exemptions and leniency

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 formative

0.80

꼭

0.78

forder

0.73

zor

0.73

每个

0.72

が必要です

0.71

ورة

0.70

刻

0.70

每個

0.69

搗

0.68

POSITIVE LOGITS

 exempt

1.69

 exempted

1.62

 exemption

1.60

 nonchal

1.55

 lenient

1.51

 tolerate

1.50

 immunity

1.50

 reprieve

1.50

 tolerant

1.50

 unscathed

1.49

Activations Density 1.594%