INDEX

Explanations

terms and phrases associated with the concept of jailbreaking devices, particularly smartphones

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

merce

-0.07

 dilig

-0.07

uvo

-0.07

æµª

-0.06

 Ð´Ð¾Ð»

-0.06

aliz

-0.06

à¸Ķà¸Ļ

-0.06

æĶ¯

-0.06

 Beled

-0.06

áº©m

-0.06

POSITIVE LOGITS

break

0.10

broken

0.10

breaking

0.10

 jail

0.10

unlock

0.09

 Jail

0.09

 evasion

0.09

hack

0.09

breaker

0.08

code

0.08

Activations Density 0.005%