INDEX
    Explanations

    This neuron primarily detects occurrences of the standalone token “New” (as in place names like “New York” or “New Jersey”).

    New Auto-Interp
    Negative Logits
    veh
    -0.07
     Democr
    -0.07
    fcn
    -0.07
     educ
    -0.07
    /lic
    -0.07
    -0.06
     доч
    -0.06
    rane
    -0.06
    -0.06
    fc
    -0.06
    POSITIVE LOGITS
     New
    0.09
     numerical
    0.07
     PET
    0.07
     problem
    0.07
     postav
    0.07
     too
    0.06
    estate
    0.06
    _per
    0.06
    ome
    0.06
     getir
    0.06
    Act Density 0.033%

    No Known Activations