INDEX
    Explanations

    tweets from various individuals

    instances of the word "tweeted."

    New Auto-Interp
    Negative Logits
    cum
    -0.71
    phal
    -0.70
    cised
    -0.69
    arist
    -0.67
    vernment
    -0.67
    pure
    -0.65
    neg
    -0.61
    circ
    -0.61
    istries
    -0.61
    bred
    -0.60
    POSITIVE LOGITS
     tweets
    0.92
    storms
    0.89
     tweet
    0.87
     Tweet
    0.86
     hasht
    0.82
     hashtag
    0.82
    nesday
    0.80
     tweeted
    0.79
    Tweet
    0.79
     tweeting
    0.78
    Act Density 0.016%

    No Known Activations