INDEX
    Explanations

    This neuron fires on the first content word at the start of a new sentence or segment.

    New Auto-Interp
    Negative Logits
    уры
    -0.07
    slu
    -0.07
     impost
    -0.07
    -0.06
    арам
    -0.06
    jr
    -0.06
     Neville
    -0.06
    nelle
    -0.06
    او
    -0.06
     wastewater
    -0.06
    POSITIVE LOGITS
    Towards
    0.08
    лючается
    0.07
    \brief
    0.07
     TOR
    0.06
     (^
    0.06
    <=$
    0.06
    ustrial
    0.06
     Above
    0.06
    0.06
     فصل
    0.06
    Act Density 0.079%

    No Known Activations