INDEX

Explanations

references to identity and pronouns in the context of gender

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

èĥĨ

-0.07

ãĥ¡ãĥ©

-0.06

à¸«à¸¥

-0.06

_GAP

-0.06

Å¡ka

-0.06

atoi

-0.06

uren

-0.06

arse

-0.06

lien

-0.06

-ignore

-0.06

POSITIVE LOGITS

dana

0.08

 precision

0.08

avoid

0.07

 sensitivity

0.07

 avoid

0.07

 usage

0.07

 sensitive

0.07

 respectful

0.07

 sensit

0.07

.scalablytyped

0.07

Activations Density 0.007%