INDEX

Explanations

positive statements and endorsements

The neuron detects positive evaluative language expressing approval or a welcoming tone.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

ándote

-1.16

capai

-1.13

 during

-1.11

міні

-1.09

 vanske

-1.08

PostExecute

-1.05

🪒

-1.05

совет

-1.05

 soupe

-1.02

 utford

-1.02

POSITIVE LOGITS

 endlich

1.44

可惜

1.29

it

1.29

终于

1.20

 Teig

1.20

 Endlich

1.11

 finalmente

1.11

 initiative

1.10

 correctly

1.10

 hopefully

1.09

Activations Density 0.022%