INDEX
    Explanations

    phrases related to negations and prohibitions

    negative contractions related to denial or refusal

    New Auto-Interp
    Negative Logits
    ersed
    -0.67
    accompan
    -0.66
     Darling
    -0.63
     Older
    -0.61
    lined
    -0.60
     Xuan
    -0.59
    higher
    -0.59
    anni
    -0.58
    HAHA
    -0.58
    ranged
    -0.57
    POSITIVE LOGITS
     condone
    1.25
     tolerate
    1.07
     ourselves
    0.99
     know
    0.91
    yet
    0.90
     want
    0.90
     expect
    0.89
     need
    0.88
     intend
    0.87
    ird
    0.87
    Act Density 0.100%

    No Known Activations