INDEX
    Explanations

    statements followed by a negation

    negations or phrases indicating the absence of something

    New Auto-Interp
    Negative Logits
    former
    -0.74
    çļ
    -0.69
    Redditor
    -0.69
    umbn
    -0.69
    ourses
    -0.69
    send
    -0.67
    papers
    -0.65
    rift
    -0.64
    quet
    -0.63
    aviour
    -0.63
    POSITIVE LOGITS
     uncommon
    1.37
    icable
    1.11
     clear
    1.09
     unreasonable
    1.09
     surprising
    1.08
     necessarily
    1.05
     easy
    1.03
     advisable
    0.99
     feasible
    0.96
     unusual
    0.95
    Act Density 0.079%

    No Known Activations