INDEX

Explanations

cannot and will not

The neuron activates on first‐person evaluative or emphatic expressions (e.g. “can’t,” “cannot,” “couldn’t,” “believe,” “recommend,” “begin to”) that signal a strong personal reaction or endorsement.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

1.08

1.06

0.94

0.86

ও

0.85

 lahko

0.82

want

0.80

wil

0.80

hibition

0.79

POSITIVE LOGITS

 videomuzda

0.93

 ஆகவே

0.93

𐰆

0.90

ट्टर

0.87

з

0.85

⿷

0.84

ᖕ

0.83

$-$,

0.82

től

0.82

ज़न

0.82

Activations Density 0.088%