INDEX

Explanations

. The

The neuron strongly activates on text about self‐worth and confidence—phrases where the author or audience is urged to believe in, prove, or question their own value.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

)，

0.50

)،

0.50

，

0.46

),

0.46

)、

0.46

ӵ

0.45

(),

0.45

.),

0.41

()),

0.41

='',

0.41

POSITIVE LOGITS

But

0.97

 That

0.93

And

0.91

Who

0.90

 What

0.84

 Perhaps

0.84

It

0.84

 Maybe

0.82

 Nobody

0.82

Isn

0.79

Activations Density 2.269%