INDEX
    Explanations

    phrases contrasting different perspectives or approaches

    phrases or constructs that indicate negation or the idea of "not."

    New Auto-Interp
    Negative Logits
    eur
    -0.74
    kamp
    -0.72
    stakes
    -0.69
    ction
    -0.68
    velt
    -0.64
    oise
    -0.63
    itor
    -0.63
    hower
    -0.62
    WER
    -0.62
    itiz
    -0.61
    POSITIVE LOGITS
     necessarily
    1.39
    icably
    1.33
    epad
    1.14
    icable
    1.06
    withstanding
    0.97
    orious
    0.96
     bothering
    0.91
    etheless
    0.90
    eworthy
    0.85
     exactly
    0.84
    Act Density 0.144%

    No Known Activations