INDEX
    Explanations

    connections between cause and effect

    time or comparison prepositions

    the neuron detects discourse or stance markers — short words that signal emphasis, evaluation, comparison, or framing (e.g., "truth", "clear", "more/than", "for", "into", "able").

    New Auto-Interp
    Negative Logits
     Comprometido
    -0.30
    ientôt
    -0.28
     Aérea
    -0.28
    seits
    -0.27
     Außerdem
    -0.27
     Савезне
    -0.25
     Dänemark
    -0.25
     Sementara
    -0.25
     sebelah
    -0.25
    })));
    -0.25
    POSITIVE LOGITS
    <unused79>
    0.85
    <unused52>
    0.85
    <unused74>
    0.85
    <unused43>
    0.85
    <unused47>
    0.85
    <unused23>
    0.85
    <unused41>
    0.85
    <unused16>
    0.85
    [@BOS@]
    0.85
    <unused8>
    0.85
    Act Density 0.291%

    No Known Activations