INDEX

Explanations

privileged people and wealth

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

xhr

-0.89

ularis

-0.82

 tubuh

-0.79

greenrobot

-0.76

 maglia

-0.76

Ԁ

-0.76

 maestro

-0.75

olor

-0.75

 stronger

-0.75

掖

-0.75

POSITIVE LOGITS

 privileged

1.80

 privilege

1.62

 Privilege

1.59

sno

1.55

privileged

1.54

Privilege

1.54

 Privile

1.53

 elite

1.53

Privile

1.53

 wealthy

1.52

Activations Density 0.062%