INDEX
    Explanations

    This neuron fires on occurrences of the word “negative” when used in the common two-word adjective “non-negative.”

    New Auto-Interp
    Negative Logits
     imported
    -0.07
     Burada
    -0.06
    ordes
    -0.06
    Reducer
    -0.06
    _backup
    -0.06
    AllWindows
    -0.06
    /menu
    -0.06
    ीदव
    -0.06
     Imported
    -0.05
     наличии
    -0.05
    POSITIVE LOGITS
    (todo
    0.07
    OTOS
    0.07
     regions
    0.07
    lac
    0.07
    ending
    0.07
    าค
    0.07
    Craig
    0.06
     multiple
    0.06
    ends
    0.06
    0.06
    Act Density 0.002%

    No Known Activations