INDEX

Explanations

phrases indicating biased or preconceived attitudes and decisions

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ÐĴÐ¡

-0.06

 afterward

-0.06

nech

-0.06

rÃ¡v

-0.06

emy

-0.06

 interim

-0.06

oord

-0.06

lac

-0.06

ãĥ¬ãĥ¼

-0.06

 passing

-0.05

POSITIVE LOGITS

 bevor

0.11

 before

0.10

 antes

0.10

before

0.10

 trÆ°á»Ľc

0.10

 Already

0.10

 anticipated

0.10

 already

0.10

 BEFORE

0.09

æĹ©

0.09

Activations Density 0.051%