INDEX
    Explanations

    phrases expressing negation or exclusion

    negations indicating opposition or resistance to various ideas and actions

    New Auto-Interp
    Negative Logits
    eatures
    -0.71
    ilar
    -0.71
    atural
    -0.68
     Annotations
    -0.65
    verified
    -0.64
     legit
    -0.63
    INAL
    -0.62
    lycer
    -0.62
    mentioned
    -0.62
    ensis
    -0.62
    POSITIVE LOGITS
     compl
    1.02
     succumb
    0.96
     shortcuts
    0.91
     tolerate
    0.91
     scapego
    0.91
     hesitate
    0.90
     compromises
    0.90
     compromise
    0.88
     excuses
    0.88
     shy
    0.87
    Act Density 0.318%

    No Known Activations