INDEX

Explanations

we followed by verb

The neuron strongly activates on the first‐person plural pronoun “we.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 può

-1.31

ίνει

-1.27

 bude

-1.17

Ecotoxicity

-1.16

 می‌تواند

-1.16

 namorados

-1.16

archiviato

-1.15

mbal

-1.14

 sarà

-1.12

桄

-1.10

POSITIVE LOGITS

are

3.25

 have

2.14

all

1.91

 were

1.84

 ourselves

1.63

 humans

1.51

 aren

1.36

 weren

1.24

 نباش

1.23

 потрі

1.19

Activations Density 0.063%