INDEX

Explanations

lists- `TOP_POSITIVE_LOGITS`: `=`, `<`, `אן`, `inferences`, `shade`, `$`, `しの`, ``, `vitamins`, `רית`- `TOP_ACTIVATING_TEXTS`: - "off is September 2021, so I don't have information on events after that date." - "Summarization: I can summarize text," - "Drink a large glass of water right now. Keep a water bottle with you and sip throughout the day. * Move Around: Even a 5-10 minute walk can boost circulation and alertness. Stretch, do some" - "promoting free trade agreements (though this has become more nuanced recently with some Republicans favoring protectionist measures).I am unable to provide an explanation as the input is missing the `<MAX_ACTIVATING_TOKENS>` and `<TOKENS_AFTER_MAX_ACTIVATING_TOKEN>` sections, which are crucial for identifying specific token patterns. Without these, I cannot accurately determine the neuron's behavior

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 선보

0.56

 financiación

0.53

 fokus

0.52

 추진

0.52

 Focus

0.50

 Talks

0.50

 Fokus

0.49

 Fundação

0.49

 выращи

0.49

 revamp

0.49

POSITIVE LOGITS

 `=`,

0.60

 `<`,

0.56

אן

0.50

 inferences

0.48

shade

0.46

*$

0.46

しの

0.45

*,

0.45

 vitamins

0.45

רית

0.44

Activations Density 0.215%