INDEX

Explanations

the important need

The neuron fires most strongly on low‐frequency or “uncommon” BPE subword tokens (e.g. isolated apostrophes, subword prefixes like “inc” or “leth,” rare full words like “great,” standalone digits, sentence‐initial capitals, etc.). In other words, it flags tokens that are infrequent or stand out in the vocabulary.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

platte

-1.00

iaire

-0.95

 ワゴン

-0.90

 situe

-0.89

maschine

-0.89

 Dokter

-0.88

künfte

-0.87

orios

-0.86

 vive

-0.85

 результат

-0.84

POSITIVE LOGITS

 Sünde

0.91

 able

0.88

 their

0.88

kao

0.88

 mecánico

0.88

 that

0.87

젖

0.86

 dianteiro

0.85

 easily

0.85

 practically

0.85

Activations Density 0.184%