INDEX

Explanations

looks followed by descriptions

The neuron fires on positive evaluative words and phrases—especially praise or “good‐performance” descriptors (e.g. looks good, played well, positive).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 aqueles

-1.47

 rescate

-1.45

 estabiliz

-1.43

 kupa

-1.42

 moldura

-1.42

 asientos

-1.38

 perfeitamente

-1.37

 pulseira

-1.36

 stary

-1.34

apka

-1.32

POSITIVE LOGITS

in

1.72

’

1.68

for

1.55

an

1.41

ка

1.31

ta

1.30

gen

1.30

as

1.30

ha

1.28

</em>

1.24

Activations Density 0.023%