INDEX
    Explanations

    words signaling certainty or emphasis

    the word "definitely" and its variations, indicating strong affirmation

    New Auto-Interp
    Negative Logits
    acity
    -0.80
    glers
    -0.79
    mits
    -0.78
    bestos
    -0.75
    roups
    -0.75
    lings
    -0.75
    Reviewer
    -0.74
    ufact
    -0.73
    gencies
    -0.70
    umbn
    -0.68
    POSITIVE LOGITS
     identifiable
    0.72
     impacted
    0.70
     qualifies
    0.70
     differentiated
    0.67
     Vader
    0.67
     disqual
    0.66
     deline
    0.66
     detract
    0.66
     benefited
    0.65
     noticeable
    0.65
    Act Density 0.024%

    No Known Activations