INDEX
    Explanations

    negative contractions, particularly "doesn't."

    New Auto-Interp
    Negative Logits
    toBe
    -0.71
     varit
    -0.61
     wären
    -0.60
     թվական
    -0.58
    wares
    -0.57
    aternary
    -0.56
    اعد
    -0.55
     seront
    -0.54
    ریح
    -0.53
     sebelumnya
    -0.53
    POSITIVE LOGITS
     do
    0.92
     DO
    0.91
    httphttps
    0.88
     does
    0.86
     Does
    0.85
     DOES
    0.84
    DOES
    0.82
    does
    0.81
     didst
    0.80
    ും
    0.77
    Act Density 0.239%

    No Known Activations