INDEX

Explanations

social stigma and discrimination

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

entropy

0.38

 entropy

0.38

 acet

0.38

熵

0.38

闓

0.37

 rethinking

0.37

ᠠ

0.37

こ

0.37

允

0.37

ewe

0.37

POSITIVE LOGITS

 harassment

0.74

 bullies

0.73

 persecution

0.71

 stigmat

0.70

 bullying

0.69

 ridicule

0.66

 bullied

0.65

 stigma

0.65

 discriminatory

0.64

 ostr

0.63

Activations Density 0.125%