INDEX

Explanations

terms related to safety, particularly in contexts involving children

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

soever

-0.08

lish

-0.08

regor

-0.08

hift

-0.07

atre

-0.07

celed

-0.07

ethyst

-0.07

inous

-0.07

alyzed

-0.07

ASE

-0.07

POSITIVE LOGITS

hav

0.07

ies

0.07

ord

0.07

 Chung

0.06

ient

0.06

AreaView

0.06

 Safe

0.06

 ness

0.06

ucci

0.06

sym

0.06

Activations Density 0.009%