INDEX

Explanations

man, woman and human

This neuron detects explicit references to being human or male, such as the words “human” and “man.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

or

-1.73

sejarah

-1.39

 lecker

-1.26

 süß

-1.21

the

-1.19

 mädchen

-1.15

 similar

-1.14

 gebruikers

-1.13

 serca

-1.12

 oryginal

-1.12

POSITIVE LOGITS

☆、

1.42

even

1.24

since

1.23

 inov

1.19

 Manifesto

1.18

with

1.17

 Deve

1.16

>');

1.16

 メニュー

1.16

これからも

1.15

Activations Density 0.048%