INDEX

Explanations

language related to dehumanization and objectification

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

itra

-0.07

orta

-0.07

ieri

-0.07

gers

-0.07

asc

-0.07

 Ð½Ð°Ð¿ÑĢÑıÐ¼

-0.07

 Priority

-0.06

benchmark

-0.06

Trademark

-0.06

 seat

-0.06

POSITIVE LOGITS

-Za

0.06

Ã¶h

0.06

 preceded

0.06

å¯¹æĸ¹

0.06

 Followers

0.06

_similarity

0.06

 ê·¸ëŀĺ

0.06

 Soup

0.06

 cheap

0.06

 [];č↵

0.06

Activations Density 0.004%