INDEX

Explanations

no negation or directness

The neuron strongly activates on the key noun or verb that immediately follows a negation (e.g. “no time left,” “no equivocation,” “don’t sugar-coat,” “no hidden charges,” “no excuses”), i.e. the word that names what is being denied, refused, or absent.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 pantaloni

-1.19

きっかけ

-1.13

 vẫn

-1.13

삭제

-1.11

 something

-1.10

로그인

-1.09

 rather

-1.07

でもあります

-1.05

rather

-0.98

Ejemplo

-0.98

POSITIVE LOGITS

or

1.88

 anymore

1.43

 this

1.23

 here

1.23

nor

1.19

any

1.16

 second

1.05

 или

1.03

there

0.98

 только

0.96

Activations Density 0.022%