INDEX

Explanations

men, women, children

The main thing this neuron does is detect gender-specific category labels (e.g. “Women’s” or “Men’s”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

for

-1.27

in

-1.02

 arranged

-0.90

剖

-0.85

 arranging

-0.83

 लोगों

-0.80

 appunt

-0.79

简介

-0.79

municipi

-0.77

 Políticas

-0.76

POSITIVE LOGITS

 mannen

0.92

 ženy

0.91

 StyleSheet

0.91

teens

0.90

 nécess

0.90

tà

0.89

boys

0.88

 aiment

0.88

 parfü

0.87

ленного

0.87

Activations Density 0.019%