INDEX
    Explanations

    The neuron fires on occurrences of the word “names” (as in “the names of…”).

    New Auto-Interp
    Negative Logits
    ойно
    -0.06
    -0.06
     isNew
    -0.06
     Тур
    -0.05
    Iteration
    -0.05
     fetish
    -0.05
    antro
    -0.05
     InkWell
    -0.05
    Expose
    -0.05
     precondition
    -0.05
    POSITIVE LOGITS
    $core
    0.07
    ANCELED
    0.07
     Certified
    0.07
    ","
    0.07
     bombings
    0.07
    ิย
    0.07
    #[
    0.07
     fromDate
    0.07
    \",\
    0.06
     defe
    0.06
    Act Density 0.001%

    No Known Activations