INDEX

Explanations

positive affirmations regarding conditions or instructions

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

must

-0.06

CISION

-0.06

 Tire

-0.06

å¿ħé¡»

-0.06

onis

-0.06

 å¿ħ

-0.06

]|[

-0.06

ald

-0.06

 ãĥĶ

-0.06

Ð°ÑĢÑı

-0.06

POSITIVE LOGITS

OK

0.10

 safe

0.10

OK

0.09

 okay

0.09

safe

0.09

-safe

0.08

ok

0.08

 Safe

0.08

 safely

0.08

 proceed

0.08

Activations Density 0.035%