INDEX
    Explanations

    The neuron activates on negation words (e.g., “not”).

    New Auto-Interp
    Negative Logits
     داو
    -0.06
    Deprecated
    -0.06
     smoothly
    -0.06
    ;y
    -0.06
    ?option
    -0.06
     notifies
    -0.06
    arı
    -0.06
    ếu
    -0.06
    uestos
    -0.06
     numberOfRows
    -0.06
    POSITIVE LOGITS
    Nine
    0.07
    .Master
    0.07
     Terra
    0.07
     ejac
    0.06
    关系
    0.06
     eclectic
    0.06
     соп
    0.06
    0.06
    уз
    0.06
    _fname
    0.06
    Act Density 0.005%

    No Known Activations