INDEX

Explanations

positive descriptors

The neuron activates strongly on attribute‐describing adjectives or qualifier words (e.g. “簡単な,” “useful,” “important,” “excellent”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

foundland

-1.00

gulls

-0.93

ês

-0.91

 있어

-0.91

 cualquier

-0.90

 любых

-0.88

 любые

-0.88

և

-0.87

良いです

-0.87

 mécanisme

-0.86

POSITIVE LOGITS

 things

1.56

 thing

1.43

 ones

1.24

 stuff

1.15

데

1.09

things

0.94

ilibre

0.93

weise

0.93

 şekilde

0.89

 form

0.88

Activations Density 0.002%