INDEX

Explanations

statements about identity and belonging, particularly related to marginalized groups

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

colo

-0.08

anter

-0.06

inton

-0.06

itti

-0.06

 teklif

-0.06

Ð½ÐµÑĤ

-0.06

 Glas

-0.06

çģ

-0.06

.writ

-0.06

avana

-0.06

POSITIVE LOGITS

 instanceof

0.08

isa

0.07

±

0.06

_IS

0.06

ç¨®

0.06

 himself

0.06

å±ŀäºİ

0.06

Į

0.06

 classified

0.06

anga

0.06

Activations Density 0.067%