INDEX
Explanations
connections between cause and effect
time or comparison prepositions
the neuron detects discourse or stance markers — short words that signal emphasis, evaluation, comparison, or framing (e.g., "truth", "clear", "more/than", "for", "into", "able").
New Auto-Interp
Negative Logits
Comprometido
-0.30
ientôt
-0.28
Aérea
-0.28
seits
-0.27
Außerdem
-0.27
Савезне
-0.25
Dänemark
-0.25
Sementara
-0.25
sebelah
-0.25
})));
-0.25
POSITIVE LOGITS
<unused79>
0.85
<unused52>
0.85
<unused74>
0.85
<unused43>
0.85
<unused47>
0.85
<unused23>
0.85
<unused41>
0.85
<unused16>
0.85
[@BOS@]
0.85
<unused8>
0.85
Activations Density 0.291%