INDEX

Explanations

would followed by verb

The neuron activates strongly on the modal verb “would.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

is

-1.76

of

-1.57

</h1>

-1.51

 increases

-1.49

</h2>

-1.48

or

-1.42

 multifaceted

-1.39

 increased

-1.38

りますが

-1.36

POSITIVE LOGITS

⤒

1.79

iffance

1.67

Its

1.53

 różnych

1.48

1.46

蜮

1.44

athons

1.44

にほんブログ村

1.38

biendo

1.36

Activations Density 0.060%