INDEX

Explanations

character roles and relationships

The neuron strongly activates on words that denote people’s demographic or identity attributes (e.g. race, gender, sexual orientation, disability).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

當時

-1.16

誰か

-1.04

 seseorang

-1.04

嚐

-1.02

於是

-0.98

掙

-0.97

樣子

-0.94

 fisico

-0.94

 morons

-0.93

オム

-0.92

POSITIVE LOGITS

 protective

1.23

gay

1.23

ner

1.15

 alcoholic

1.12

African

1.09

 sche

1.06

 African

1.04

 drug

1.03

 loud

1.02

 Buddhist

1.02

Activations Density 0.045%