INDEX
    Explanations

    other remaining

    The neuron reliably fires on the word “other,” flagging mentions of “other” categories or items.

    New Auto-Interp
    Negative Logits
    Hop
    -0.07
     volunteer
    -0.07
    ising
    -0.07
     Poison
    -0.06
     Woman
    -0.06
    [level
    -0.06
    USART
    -0.06
     safety
    -0.06
     Governor
    -0.06
     recipe
    -0.06
    POSITIVE LOGITS
     zboží
    0.07
    exas
    0.07
     с
    0.06
    ’é
    0.06
    дина
    0.06
     bilinen
    0.06
     contours
    0.06
    ngr
    0.06
     شرقی
    0.06
     برنامه
    0.06
    Act Density 0.017%

    No Known Activations