INDEX

Explanations

inflict, afflict, inflection

The neuron fires on words related to inflicting or causing harm (e.g. inflict, inflicted, afflict, infliction).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

珢

-2.63

síða

-2.59

actéristi

-2.48

秝

-2.41

ፔ

-2.30

繽

-2.25

父の

-2.25

ᚔ

-2.22

Ꮷ

-2.20

艄

-2.20

POSITIVE LOGITS

↵

3.22

“

2.77

2.66

也

2.41

?"

2.36

The

2.28

 There

2.16

’

2.14

2.08

")

2.05

Activations Density 0.003%