INDEX

Explanations

references to jailbreaking and related technical processes

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ema

-0.09

UrlParser

-0.07

_OBJC

-0.07

IID

-0.07

 reserv

-0.07

.Encoding

-0.07

shm

-0.07

ãĥ¬ãĥĥãĥĪ

-0.07

 Äįt

-0.07

dana

-0.06

POSITIVE LOGITS

qq

0.07

vil

0.06

 Boots

0.06

 power

0.06

 bypass

0.06

 thá»§

0.06

 onActivityResult

0.06

uhn

0.05

 jail

0.05

ooter

0.05

Activations Density 0.007%