INDEX

Explanations

actions involving him or herself

The neuron flags words expressing internal states or attitudes—especially mental‐state verbs (e.g. wanted, considered, believing).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

娶

-1.53

 boyhood

-1.34

 istrinya

-1.30

妻

-1.30

给她

-1.23

its

-1.20

 établie

-1.18

其

-1.13

对她

-1.13

 himself

-1.09

POSITIVE LOGITS

 herself

5.16

herself

3.58

 сама

2.25

 نفسها

2.00

 powinna

1.55

 elle

1.51

 została

1.40

 должна

1.40

him

1.38

сама

1.37

Activations Density 0.045%