INDEX

Explanations

refused to participate or do

The neuron fires on infinitive constructions expressing refusal (e.g. “refused to …”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 dwindling

-2.05

 crafty

-1.94

先日

-1.85

ar

-1.85

–

-1.80

遨

-1.80

 lengthy

-1.80

 enormous

-1.79

そして

-1.76

 dozens

-1.72

POSITIVE LOGITS

🫃

2.05

 horloge

1.91

腘

1.88

脔

1.81

 anormal

1.74

 Signalez

1.69

乀

1.68

);

1.68

 micrófono

1.65

 herv

1.62

Activations Density 0.014%