INDEX

Explanations

clarifying phrases

This neuron detects hedging or clarifying discourse markers like “to be sure,” “to be clear,” etc.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

to

-1.69

业主

-1.66

と言えば

-1.57

のお知らせ

-1.55

%;

-1.54

甞

-1.52

ceptable

-1.48

心に

-1.47

我又

-1.47

POSITIVE LOGITS

of

1.95

1.92

1.80

 also

1.59

But

1.56

骤

1.52

↑↑↑</

1.51

 This

1.45

 únicamente

1.45

Activations Density 0.005%