INDEX
    Explanations

    phrases indicating probability or likelihood

    New Auto-Interp
    Negative Logits
    idum
    -0.70
    providedIn
    -0.67
    Фор
    -0.64
    ByVersion
    -0.63
     pstmt
    -0.63
    andaag
    -0.62
     everybody
    -0.62
    Everybody
    -0.61
    twimg
    -0.61
     Schwe
    -0.61
    POSITIVE LOGITS
     likely
    2.99
    likely
    2.77
     Likely
    2.77
    Likely
    2.52
     LIK
    1.72
     unlikely
    1.64
    unlikely
    1.58
     likelihood
    1.57
     Likelihood
    1.41
    likelihood
    1.36
    Act Density 0.063%

    No Known Activations