INDEX

Explanations

specific phrases and titles

The neuron is detecting mentions of “fun” and other strongly positive, enjoyment-oriented words describing pleasurable or upbeat experiences.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

𝒙

-1.45

 Küche

-1.25

or

-1.20

 أشهر

-1.17

pushd

-1.16

累了

-1.13

iços

-1.13

ciri

-1.11

𝛽

-1.10

 familiar

-1.07

POSITIVE LOGITS

 هناك

1.61

 there

1.50

 myself

1.48

dokument

1.41

 Ketika

1.41

 Misalnya

1.38

𝑈

1.38

 Bagaimana

1.36

 khán

1.34

 Especific

1.30

Activations Density 0.003%