INDEX

Explanations

sequences after English words

The neuron responds strongly to uncommon, content‐bearing words (e.g. “going,” “large,” “off topic,” “found,” “obtained”), i.e. it flags rare or infrequently used tokens.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 UNIV

0.95

ன்மையான

0.84

bbq

0.84

 PROCED

0.83

 نیست

0.82

 influência

0.80

════

0.79

 安心

0.78

 completos

0.77

 TERMIN

0.76

POSITIVE LOGITS

Aby

0.73

За

0.72

he

0.71

ма

0.71

Mi

0.71

Pada

0.71

Prz

0.70

Он

0.69

্যোগ

0.69

Aw

0.68

Activations Density 0.000%