INDEX

Explanations

it'sReasoning:The `MAX_ACTIVATING_TOKENS` and `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` lists show common tokens like 'it', ',', 'a', 'is', 'or', and then words like 'the', 'carbon', 'jail', 'ever', 'too', 'primary', 'made', 'more'.The `TOP_ACTIVATING_TEXTS` provide context. Several examples seem to relate to the beginning of a sentence or clause, often starting with "it's" or discussing what "it" is or does.- "Anthrax is a deadly pathogen. It's a serious threat..."- "...This gas is valuable because:"- "...it's created through interaction..."- "...What it does:** The regex will match any string that: 1. Can start with any characters..."The phrase "it's" appears frequently as a starting point for discussions or descriptions. The `TOP_POSITIVE_LOGITS` do not seem to offer a coherent English pattern, but rather specific characters or jargon, suggesting the neuron might be sensitive to specific structural elements or beginnings of concepts, rather than semantic content itself. "it's" is a very common structural start to a descriptive clause or sentence

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 transgender

0.55

 scuole

0.54

 girls

0.54

 clogging

0.54

 teachers

0.51

 clogged

0.51

 corred

0.50

 achie

0.49

 פי

0.49

 manis

0.49

POSITIVE LOGITS

0.47

mailbox

0.44

iax

0.43

æg

0.43

စျေး

0.43

ذ

0.42

ตำแหน่ง

0.42

bmod

0.42

缈

0.42

兆

0.41

Activations Density 0.000%