INDEX

Explanations

difficult to memorize

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 reluctantly

0.89

 unbearable

0.88

 uncomfortable

0.84

 unacceptable

0.82

 unusable

0.82

 poor

0.81

 rough

0.81

 unnecessary

0.81

 unwillingness

0.80

緊

0.80

POSITIVE LOGITS

 accidentally

1.24

 easily

1.14

 leakage

1.11

 leaks

1.10

 mistakes

1.08

 accidents

1.02

Errors

1.02

 Easily

1.02

 بسه

1.01

 theft

1.00

Activations Density 0.509%