INDEX

Explanations

her, she, his, him

The neuron primarily activates on occurrences of the words “HER,” “His,” or “her,” i.e., possessive pronouns (often capitalized) referring to gender.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

of

-1.66

');

-1.55

overline

-1.48

我们的

-1.48

 forState

-1.47

 innych

-1.44

assertEqual

-1.43

olução

-1.43

され

-1.41

olver

-1.41

POSITIVE LOGITS

and

1.70

 ‘‘

1.58

能在

1.48

’，

1.41

but

1.40

 amplia

1.39

 türlü

1.38

 simplement

1.33

simply

1.31

 ayudó

1.30

Activations Density 0.032%