INDEX

Explanations

rejection and rejected

The main thing this neuron does is detect occurrences of the root “reject” (e.g. reject, rejection, rejecting) or closely related disapproval terms.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

.”

-1.80

</em>

-1.71

睺

-1.67

 interesan

-1.62

 heuti

-1.62

 proporcionan

-1.59

któ

-1.52

 harán

-1.52



-1.50

 včetně

-1.49

POSITIVE LOGITS

 with

2.34

="

1.91

是個

1.79

 from

1.78

描く

1.73

of

1.68

 voyez

1.64

for

1.63

 part

1.60

Activations Density 0.007%