INDEX

Explanations

unintended consequences or errors

The neuron is essentially responding to mid‐frequency, content‐bearing words (i.e. relatively “rare” nouns and verbs) rather than common function words.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

💙

1.04

💚

0.96

💫

0.94

💪

0.91

 vivió

0.91

🦋

0.90

🌺

0.89

天使

0.88

☀️

0.88

✨

0.88

POSITIVE LOGITS

 misleading

1.44

 unduly

1.35

 inaccurate

1.31

 unrealistic

1.31

 erroneous

1.27

 insufficient

1.25

 inadvert

1.23

 unnecessarily

1.20

 unreasonable

1.19

 inconvenient

1.18

Activations Density 0.871%