INDEX
    Explanations

    phrases indicating denial or negation

    negations and phrases indicating a lack of agreement or acceptance

    New Auto-Interp
    Negative Logits
    inus
    -0.70
    thus
    -0.66
    folios
    -0.64
    bath
    -0.62
    shown
    -0.62
    akeru
    -0.61
    sold
    -0.60
    press
    -0.60
    ubi
    -0.58
    rush
    -0.58
    POSITIVE LOGITS
     ones
    0.68
     theoretically
    0.63
    othes
    0.62
     foreseeable
    0.61
     Spoiler
    0.61
    vised
    0.59
    ptroller
    0.59
    hap
    0.58
     lawyers
    0.57
    plom
    0.57
    Act Density 0.066%

    No Known Activations