INDEX

Explanations

what they differ is

The neuron detects evaluative or contrastive cue words (e.g. “differ,” “what,” “matter,” “more,” “surprising,” “however”) that introduce questions or emphasize comparisons.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

to

-1.06

}$.

-1.02

eably

-0.98

龵

-0.96

addGap

-0.94

democracy

-0.94

naphthal

-0.92

].

-0.92

 paillettes

-0.92

阏

-0.91

POSITIVE LOGITS

However

1.22

卻

1.17

1.02

are

1.02

 however

0.96

is

0.95

 invece

0.91

 Equally

0.90

яс

0.89

 Biaya

0.88

Activations Density 0.021%