INDEX

Explanations

Analyzing Neuron BehaviorI am currently focused on distilling the core mechanism of neuron activation from the provided input lists. My aim is to encapsulate this behavior into a brief, specific phrase, avoiding any unnecessary preamble.Looking at the `MAX_ACTIVATING_TOKENS`: `api`, `late`, `beginning`, `problems`.Looking at the `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`: `)` (after `api`), `of` (after `api`), `Capital` (after `late`), `raw` (after `beginning`), `for` (after `ake`), `and` (after `ed`), `?"` (after `problems`).Analyzing Neuron BehaviorI am currently focused on distilling the core mechanism of neuron activation from the provided input lists. My aim is to encapsulate this behavior into a brief, specific phrase, avoiding any unnecessary preamble.The `MAX_ACTIVATING_TOKENS` show words like `api`, `late`, `beginning`, `problems`.The `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` often show punctuation or common words: `)`, `of`, `Capital`, `raw`, `for`, `and`, `?"`.`TOP_POSITIVE_LOGITS` contain diverse words in different scripts, not immediately forming a clear semantic pattern with the activating tokens.`TOP_ACTIVATING_TEXTS` include code snippets (`import win32gui`, `api`), descriptions of time (`late 20th-century`, `beginning of LLMs`), and problem descriptions (`creates problems`).The common thread seems to be relating to common programming/library terms (like `api`) and concepts of time (`late`, `beginning`) or issues (`problems`). The tokens after `api` (`of`, `)`) suggest it might be related to function calls or definitions. The token `Capital` after `late` and

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

0.55

doh

0.54

0.52

0.48

HON

0.48

 Diplomacy

0.48

esk

0.47

חות

0.47

 sklad

0.47

POSITIVE LOGITS

 गर्

0.49

PAssignment

0.47

alupe

0.47

 کال

0.46

 verpflichtet

0.46

 نیست

0.45

ຮູ້

0.45

 χρει

0.45

 bolstered

0.45

ຮ

0.44

Activations Density 0.000%