INDEX

Explanations

conjunctions and lists

The neuron strongly activates on negation or prohibition cues (e.g. “forget,” “never”). It flags words expressing a negative or forbidding sense.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

of

-1.55

ఽ

-1.34

 infrar

-1.27

SFER

-1.26

 oliveira

-1.24

 alis

-1.21

 rigide

-1.19

 sedi

-1.18

 sirop

-1.17

 munici

-1.16

POSITIVE LOGITS

and

2.64

or

2.25

new

1.30

 или

1.22

 puntas

1.20

 three

1.14

 alebo

1.14

 hambre

1.13

 abundancia

1.10

 apreciar

1.09

Activations Density 0.042%