INDEX
    Explanations

    This neuron detects words and contractions that express speaker stance—particularly modal verbs, negations, and intensifying adverbs (e.g. “can,” “not,” “’re,” “absolutely,” “won’t,” “’ll”).

    New Auto-Interp
    Negative Logits
     Fort
    -0.06
    /nav
    -0.06
    43
    -0.06
     επί
    -0.06
    What
    -0.06
     heaters
    -0.06
     caracter
    -0.06
    13
    -0.05
    -ish
    -0.05
    _launcher
    -0.05
    POSITIVE LOGITS
    ANNOT
    0.07
    couldn
    0.07
     couldn
    0.07
    0.07
     erot
    0.06
    .vec
    0.06
     DEFIN
    0.06
     hadn
    0.06
    ENCED
    0.06
    :invoke
    0.06
    Act Density 0.092%

    No Known Activations