INDEX

Explanations

themes related to challenging social norms and conventions

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

rana

-0.07

/live

-0.06

DEC

-0.06

dup

-0.06

roup

-0.06

enal

-0.06

estre

-0.06

ÙĨØ¨

-0.06

celed

-0.06

 pending

-0.06

POSITIVE LOGITS

 altogether

0.07

 conventional

0.07

 restr

0.07

 conventions

0.07

 convention

0.07

 challeng

0.07

-alist

0.07

alto

0.06

 traditional

0.06

 expectations

0.06

Activations Density 0.022%