INDEX

Explanations

Finding patterns in lists.MAX_ACTIVATING_TOKENS: structural elements.TOKENS_AFTER_MAX_ACTIVATING_TOKEN: specific nouns.TOP_POSITIVE_LOGITS: specific words, names, concepts.TOP_ACTIVATING_TEXTS: descriptive of specific domains or entities.The neuron seems to be triggered by specific technical terms, names, or defined concepts presented in the text. If the neuron is looking for specific items, what's a concise way to say it?Let's try to relate it to the examples:'generate', 'model', 'group' (Method 2) are specific nouns.'Gideon', 'War', 'Ge

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

цтва

0.44

 Pari

0.39

dex

0.37

kr

0.36

এক্স

0.36

Imp

0.36

Bub

0.36

kr

0.35

CharAt

0.35

MP

0.35

POSITIVE LOGITS

 Gideon

0.47

 Greedy

0.40

丟

0.40

 আক্রমন

0.39

War

0.39

ichts

0.39

 Geek

0.39

൮

0.39

 বালক

0.38

Way

0.38

Activations Density 0.000%