INDEX
Explanations
summarization followed by punctuation
The neuron detects salient content-carrying words — important task/topic nouns and verbs (i.e., semantically informative tokens).
New Auto-Interp
Negative Logits
tenang
0.24
existem
0.22
stratégie
0.22
déplacer
0.22
théorie
0.22
असून
0.21
utilisés
0.21
demasi
0.21
bruge
0.20
psychiat
0.20
POSITIVE LOGITS
."
0.26
.")
0.26
.`
0.25
.”
0.25
。”
0.25
_.
0.24
."""
0.24
↵↵↵↵↵↵↵↵↵↵↵
0.24
".
0.24
。【
0.24
Activations Density 2.415%