INDEX

Explanations

willing, unwilling, prepared, would

This neuron detects constructions where a desire/volition word (e.g. willing, unwilling, prepared, refused, like, accept, bear, submit) is immediately followed by the infinitive marker “to.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

//

-1.78

 will

-1.69

 pelaksanaan

-1.48

 españolas

-1.45

！！！

-1.43

はい

-1.43

ဴ

-1.41

-1.40

 aisladas

-1.40

歳の

-1.40

POSITIVE LOGITS

THERE

1.60

ufficient

1.56

to

1.55

 cependant

1.50

麁

1.48

however

1.45

bies

1.45

simplified

1.40

 Anyone

1.39

previously

1.38

Activations Density 0.005%