INDEX

Explanations

Vorlag

The neuron fires on the leading subword of capitalized words (i.e. the first piece of proper nouns).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 réduite

0.57

éristiques

0.56

 wiem

0.55

 확대

0.55

つい

0.54

পেপার

0.54

 seconded

0.53

 সর্বনাশ

0.53

 proxim

0.52

 reduz

0.52

POSITIVE LOGITS

Kel

0.64

Wo

0.62

WO

0.61

WO

0.61

Woo

0.60

 Вул

0.59

KEL

0.58

erful

0.58

Wor

0.58

Wor

0.57

Activations Density 0.227%