INDEX

Explanations

refuse, decline, reject

The neuron activates strongly on numeric tokens (digits and numbers) in the text.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 допуска

-0.86

ishu

-0.79

ophila

-0.78

 oppress

-0.78

knn

-0.78

 MAIL

-0.77

 Adaptive

-0.77

 reopened

-0.77

 схе

-0.74

зы

-0.73

POSITIVE LOGITS

 declined

3.97

 decline

3.67

拒绝

3.59

 refusal

3.56

 reject

3.56

 rejection

3.48

 rejecting

3.47

 refused

3.27

 rejected

3.23

 refuse

3.14

Activations Density 0.049%