oppression, discrimination, exploitation

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ية

0.57

 ativos

0.52

После

0.52

0.50

0.49

エ

0.49

 árvore

0.48

 after

0.47

POSITIVE LOGITS

 oppression

0.70

 oppressed

0.61

 coercive

0.60

 harassment

0.58

 restrictive

0.56

 abuse

0.56

 discrimination

0.56

 misuse

0.55

 discriminatory

0.55

 oppressive

0.54

Activations Density 0.443%