INDEX

Explanations

questioning if you are

The neuron is essentially detecting high‐frequency function words—especially personal pronouns (you, we, I) and basic auxiliary or modal verbs (are, will, not, trying).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

Spending

0.63

 Spending

0.63

 เพียง

0.59

 spending

0.55

再加上

0.55

 IMDb

0.54

spending

0.54

现在的

0.54

 devoting

0.53

édéric

0.52

POSITIVE LOGITS

 naughty

0.79

 loosing

0.77

 murderers

0.75

 VERY

0.74

 insane

0.73

 murderous

0.73

 stupid

0.71

 idiots

0.71

 killers

0.70

 stink

0.70

Activations Density 0.100%