INDEX

Explanations

don't just

The neuron selectively responds to short common function words that express negation, quantification, or basic comparisons (e.g. “not,” “don’t,” “all,” “just,” “more,” “than”).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 aucune

0.61

лен

0.59

 কিনা

0.59

 idiot

0.59

 hardly

0.57

aucune

0.56

no

0.54

 nobody

0.51

 unenforceable

0.51

 aldrig

0.51

POSITIVE LOGITS

そして

0.60

↵

0.59

globals

0.57

Lotus

0.55

Hes

0.54

Д

0.54

 Surprisingly

0.53

 tačiau

0.53

然而

0.53

 Additionally

0.52

Activations Density 0.435%