INDEX

Explanations

positive evaluations and outcomes

The neuron is keyed to positive evaluative or praise words conveying approval or admiration.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

Specifik

-0.97

Enders

-0.97

 WITHOUT

-0.96

 ángeles

-0.94

 kvadrat

-0.93

 perangkat

-0.93

lación

-0.93

 fön

-0.92

nó

-0.92

öra

-0.91

POSITIVE LOGITS

 work

1.81

to

1.80

job

1.77

 that

1.48

 seeing

1.45

 choice

1.43

1.23

 contribution

1.16

 hearing

1.15

 effort

1.11

Activations Density 0.017%