INDEX
    Explanations

    negation followed by contrast

    New Auto-Interp
    Negative Logits
     anything
    1.38
     any
    1.30
     Anything
    1.15
     qualquer
    1.14
    anything
    1.11
     cualquier
    1.08
    任何
    1.08
     qualsiasi
    1.07
     anyone
    1.06
     ANYTHING
    1.04
    POSITIVE LOGITS
    Nor
    1.54
     nor
    1.49
     Nor
    1.36
    nor
    1.12
     ni
    0.97
     nem
    0.92
     Neither
    0.89
    Neither
    0.89
    也不是
    0.87
    Nem
    0.87
    Act Density 0.130%

    No Known Activations