INDEX

Explanations

'might be', 'aware that', 'following closely', 'finally found', 'talked about'

The neuron fires on first-person self-references (tokens like “I”, “I’m”, “I’ve”, etc.).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

ﻠ

0.67

Ꭳ

0.57

colm

0.56

ﻤ

0.56

canic

0.55

ﻨ

0.55

ல்கள்

0.55

IELD

0.55

<0x83>

0.54

景

0.54

POSITIVE LOGITS

 myself

0.74

 pribadi

0.70

 jestem

0.66

 faccio

0.61

 unwittingly

0.59

 pegawai

0.59

 dieses

0.58

 defeats

0.58

 persönlich

0.58

 hastily

0.58

Activations Density 0.085%