INDEX
    Explanations

    negation terms or phrases such as "nor" and "neither."

    New Auto-Interp
    Negative Logits
    fwd
    -0.68
    pezi
    -0.66
    ittens
    -0.66
     ARAB
    -0.65
     SAX
    -0.62
    wiście
    -0.60
     bulk
    -0.60
     Brag
    -0.59
     Turn
    -0.58
    -0.58
    POSITIVE LOGITS
    neither
    1.24
    Nor
    1.20
     neither
    1.18
     nor
    1.17
    nor
    1.12
     NOR
    1.12
    Neither
    1.09
     Neither
    1.08
     Nor
    1.07
    theless
    1.05
    Act Density 0.068%

    No Known Activations