INDEX
    Explanations

    phrases related to claims, beliefs, and statements of certainty

    expressions of belief or claims regarding factual statements

    New Auto-Interp
    Negative Logits
    ratulations
    -0.72
    ntil
    -0.66
    perty
    -0.64
    Reply
    -0.62
    entanyl
    -0.62
    ————
    -0.59
    dding
    -0.59
    endment
    -0.59
    essen
    -0.58
    untled
    -0.57
    POSITIVE LOGITS
     constitutes
    1.08
     represents
    1.01
     belongs
    1.01
     deserves
    1.00
     qualifies
    0.97
     proves
    0.93
     could
    0.91
     resembles
    0.91
     amounted
    0.91
     contains
    0.90
    Act Density 0.125%

    No Known Activations