INDEX

Explanations

non-english words and numbers

The neuron fires on mid‐ to high‐weight content words that carry evaluative or judgmental meaning (e.g. “erasing,” “respect,” “argue,” “best,” “value,” “power,” “improved”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 might

-1.23

astanza

-1.22

 didn

-1.19

 Mostly

-1.16

 occasionally

-1.16

 Sometimes

-1.16

 wasn

-1.13

อยาก

-1.13

 hadn

-1.13

หลัง

-1.12

POSITIVE LOGITS

 trente

1.17

 günstiger

1.16

niger

1.09

 россия

1.07

xvii

1.07

 lahat

1.07

 Projektu

1.06

zwi

1.05

だけではなく

1.05

 FÜR

1.05

Activations Density 0.123%