INDEX
Explanations
The neuron activates when the word "people" appears. The subsequent tokens suggest descriptions or actions related to people, such as "jog" or "working". The positive logits show a diverse set of potential associations, but the most direct and unifying pattern is the presence of "people" itself.Therefore, the most specific and accurate explanation based on the clear pattern in `MAX_ACTIVATING_TOKENS` and the common context in `TOP_ACTIVATING_TEXTS` is simply the word "people".Explanation: people
New Auto-Interp
Negative Logits
ag
1.64
nghiệm
1.46
ig
1.39
ut
1.38
aj
1.36
Alright
1.31
мате
1.30
めに
1.29
IN
1.27
ں
1.27
POSITIVE LOGITS
ли
1.51
ление
1.50
hips
1.43
publiques
1.42
affiliated
1.36
autochtones
1.34
Algebras
1.32
olefins
1.32
nonexpansive
1.30
parishes
1.27
Activations Density 0.307%