INDEX

Explanations

The task is to explain the behavior of a neuron concisely using the provided lists.The explanation should be a phrase between 3 and 20 words, describing what the neuron detects or predicts by finding patterns.Let's look at the lists:* MAX\_ACTIVATING\_TOKENS: `Mac`, `people`, `Mac`, `Mac`, `NAS`, `Red`, `[`, `people`, `and`, `Specifically` Recurring tokens: `Mac`, `people`. * Other specific tokens: `NAS`, `Red`, `Specifically`.* TOKENS\_AFTER\_MAX\_ACTIVATING\_TOKEN: `).`, `-`, `)`, `).`, `(`, `es`, `].`, `they`, `analyzing`, `,` * These are mostly punctuation or short suffixes/prefixes, suggesting grammatical context or list markers.* TOP\_POSITIVE\_LOGITS: `だって`, `с`, `ع`, `ட`, `其他`, `社会`, `した`, `ih`, `お`, `íamos` * This list is dominated by tokens in non-English languages, including Japanese, Cyrillic, Arabic, Tamil, Chinese, and Portuguese.multilingual context or specific items like mac/nas/red

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 यामुळे

1.01

 whatnot

0.97

 ellipso

0.94

 prioritized

0.92

 ferrite

0.91

 contaminate

0.90

 automate

0.89

 incub

0.89

 interferon

0.89

 gradually

0.88

POSITIVE LOGITS

だって

1.05

с

1.04

ع

1.02

ட

1.02

其他

0.96

社会

0.96

した

0.93

ih

0.93

お

0.93

íamos

0.93

Activations Density 0.001%