INDEX

Explanations

safety

The neuron fires on occurrences of the word “Safety,” in particular when it appears as a heading or title.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

-2.73

–

-2.23

Of

-2.23

！"

-2.17

of

-2.13

ar

-2.06

 где

-2.05

-2.02

-2.00

POSITIVE LOGITS

趼

2.41

 gewisser

2.33

 allgeme

2.30

 gän

2.30

2.09

曖

2.08



2.06

 zwölf

2.05

衚

2.05

of

2.03

Activations Density 0.023%