INDEX
    Explanations

    The neuron fires on negative‐sentiment adjectives (e.g. “bad,” “worse,” “evil,” etc.).

    New Auto-Interp
    Negative Logits
     adidas
    -0.07
     consort
    -0.07
     Clo
    -0.06
     Schn
    -0.06
    Thông
    -0.06
    etta
    -0.06
    주시
    -0.06
    errmsg
    -0.06
    ont
    -0.06
    GEST
    -0.06
    POSITIVE LOGITS
     concluded
    0.06
     de
    0.06
    .Area
    0.06
    无码
    0.06
    instances
    0.06
     wb
    0.06
    =out
    0.06
    _literals
    0.06
    crollView
    0.06
    ,↵↵↵↵
    0.06
    Act Density 0.019%

    No Known Activations