INDEX

Explanations

don't just

The neuron strongly activates on adverbs and negation or emphasis words that qualify or limit a statement (e.g. don’t, never, only, just, really).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 sudah

-1.05

libert

-0.98

PENUTUP

-0.96

 nicht

-0.96

 parece

-0.94

 tidak

-0.93

 quitté

-0.92

 tampaknya

-0.92

("")]

-0.91

 không

-0.86

POSITIVE LOGITS

 seek

1.37

 seeking

1.24

 fear

1.23

 seeks

1.20

 ever

1.16

 just

1.10

 merely

1.10

 Verbreitung

1.07

shy

1.06

ask

1.05

Activations Density 0.033%