INDEX
    Explanations

    punctuation marks and expressions of enthusiasm or surprise

    New Auto-Interp
    Negative Logits
    ndef
    -0.17
    .TestCase
    -0.16
    strup
    -0.16
    iga
    -0.15
    ivre
    -0.15
    ypse
    -0.15
    vos
    -0.15
    ãĤ¤ãĤº
    -0.14
    sw
    -0.14
    elles
    -0.14
    POSITIVE LOGITS
     tweeted
    0.23
     tweets
    0.23
     Tweets
    0.22
    .@
    0.22
     RT
    0.21
     tweet
    0.21
    @nate
    0.20
     twitter
    0.20
    retweeted
    0.19
     twe
    0.18
    Act Density 0.027%

    No Known Activations