INDEX

Explanations

presence or falsehood

The neuron fires on first-person self-references and introspective language (e.g. “I,” “myself,” “am,” “convincing myself,” expressions of personal feeling).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 Coutinho

-0.77

ďaka

-0.73

им

-0.72

 "'");

-0.72

letting

-0.70

lp

-0.70

 Compact

-0.69

omas

-0.69

Detached

-0.69

 Eddie

-0.68

POSITIVE LOGITS

 when

1.05

 something

0.96

 nonexistent

0.95

YOND

0.90

 inexist

0.86

 which

0.84

false

0.84

يسة

0.84

 false

0.82

 jornadas

0.82

Activations Density 0.048%