INDEX

Explanations

" instructions for"

The neuron fires on explicit instruction cues—words like “instructions,” related action verbs (e.g. “disable,” “set,” “out”), and accompanying links or numbered steps—that signal how-to or directive content.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 за

-0.88

 además

-0.82

ো

-0.73

 contested

-0.73

sto

-0.72

↵↵↵

-0.71

治愈

-0.70

дли

-0.69

 Houses

-0.69

ണ്

-0.69

POSITIVE LOGITS

kilometer

0.85

vées

0.81

 droite

0.81

illées

0.81

ρης

0.80

饬

0.79

 Kompon

0.77

 popping

0.77

 sjuk

0.76

 Camargo

0.76

Activations Density 0.025%