INDEX

Explanations

terms and phrases related to safeguarding and security

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

_defs

-0.08

azio

-0.07

hood

-0.07

onde

-0.07

ÐµÑģÑĤÐ¸

-0.07

icz

-0.07

Ý

-0.07

Ì

-0.07

 Dann

-0.07

ERGE

-0.07

POSITIVE LOGITS

 against

0.13

against

0.10

 Against

0.09

ively

0.09

Against

0.08

 interests

0.08

 vulnerable

0.08

 tegen

0.08

 itself

0.07

 fragile

0.07

Activations Density 0.016%