INDEX

Explanations

not necessarily

The neuron fires on words that introduce negation, qualification, or hedging—e.g. “not,” “might,” “may,” “while,” and similar contrastive or uncertain cue words.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

それにしても

-1.08

そも

-1.05

BIDDEN

-0.92

 télécommande

-0.91

 hvit

-0.91

 kahit

-0.90

𝟸

-0.90

他に

-0.90

そういえば

-0.88

けません

-0.88

POSITIVE LOGITS

 necessarily

1.37

 perfect

1.12

 конечно

1.09

but

1.09

 может

1.02

urou

1.02

 allerdings

1.00

可能

0.98

 vielleicht

0.96

jarat

0.95

Activations Density 0.030%