INDEX
    Explanations

    The neuron detects the prefix “Pre” at the start of words (i.e., tokens beginning with “Pre-”).

    New Auto-Interp
    Negative Logits
     Elliot
    -0.08
     Solomon
    -0.07
    221
    -0.07
     Lomb
    -0.07
    jom
    -0.07
     Donovan
    -0.07
    oit
    -0.07
    олит
    -0.07
    olson
    -0.07
     Joan
    -0.07
    POSITIVE LOGITS
     pre
    0.13
     Pre
    0.12
    Pre
    0.12
    .Pre
    0.11
    .pre
    0.11
    _pre
    0.11
    /pre
    0.09
     PRE
    0.09
    _Pre
    0.09
     de
    0.09
    Act Density 0.046%

    No Known Activations