INDEX

Explanations

refusing harmful requests

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

۱

1.21

 thirteen

1.15

valueChanged

1.15

５

1.13

 cinco

1.12

 five

1.08

 tres

1.07

 vingt

1.07

۷

1.07

 acht

1.06

POSITIVE LOGITS

ceq

1.61

צו

1.56

primir

1.54

Dept

1.53

 perio

1.49

cps

1.48

ostr

1.48

 weap

1.47

mesine

1.46

חו

1.46

Activations Density 0.014%