INDEX

Explanations

The neuron activates when the word "people" appears. The subsequent tokens suggest descriptions or actions related to people, such as "jog" or "working". The positive logits show a diverse set of potential associations, but the most direct and unifying pattern is the presence of "people" itself.Therefore, the most specific and accurate explanation based on the clear pattern in `MAX_ACTIVATING_TOKENS` and the common context in `TOP_ACTIVATING_TEXTS` is simply the word "people".Explanation: people

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ag

1.64

 nghiệm

1.46

ig

1.39

ut

1.38

aj

1.36

Alright

1.31

 мате

1.30

めに

1.29

IN

1.27

ں

1.27

POSITIVE LOGITS

ли

1.51

ление

1.50

hips

1.43

 publiques

1.42

affiliated

1.36

 autochtones

1.34

 Algebras

1.32

 olefins

1.32

 nonexpansive

1.30

 parishes

1.27

Activations Density 0.307%