INDEX

Explanations

references to stereotypes and critiques of their impacts

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

lier

-0.08

ness

-0.08

arde

-0.07

íģ¼

-0.07

rate

-0.07

iero

-0.07

itation

-0.07

elter

-0.07

iane

-0.07

riel

-0.06

POSITIVE LOGITS

otypical

0.07

ically

0.07

istical

0.07

ENARIO

0.07

Ú¯Ø±

0.07

ëł

0.06

éĢļãĤĬ

0.06

 ÑģÐ¾Ð±Ð¾Ð¹

0.06

 Ster

0.06

ãĥ¼ãĥĦ

0.06

Activations Density 0.006%