INDEX
    Explanations

    This neuron activates on negation of effects—phrases expressing that something “did not” occur or had “no” effect.

    New Auto-Interp
    Negative Logits
    -0.06
    undles
    -0.06
     Dalton
    -0.06
    uforia
    -0.06
     abyss
    -0.06
     alumnos
    -0.06
    えて
    -0.06
    ثمان
    -0.06
    dbe
    -0.06
    Footer
    -0.06
    POSITIVE LOGITS
    ㅋㅋ
    0.07
    ponge
    0.06
     Turkish
    0.06
    ./
    0.06
     typename
    0.06
    .Decimal
    0.06
     popul
    0.06
    0.06
     breadcrumbs
    0.06
    раниц
    0.06
    Act Density 0.026%

    No Known Activations