INDEX

Explanations

apologies and expressions of newness

The neuron fires on meta‐commentary by the author—especially hedges, apologies, or first‐post/“hope I don’t screw this up” style remarks signaling uncertainty or inexperience.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

we

-1.14

how

-1.11

 then

-1.09

ֻ

-1.01

 diverso

-0.96

We

-0.96

 السياس

-0.96

 things

-0.95

 secretions

-0.94

 that

-0.93

POSITIVE LOGITS

皆さん

1.17

my

1.09

军团

1.08

 frumo

1.08

Sharing

1.07

 tuturor

1.05

 oamen

1.04

diese

1.04

 знают

1.00

ślę

1.00

Activations Density 0.041%