INDEX
    Explanations

    references to recent events and statements made in public forums

    New Auto-Interp
    Negative Logits
    /-
    -0.80
    )/
    -0.72
    agogue
    -0.69
    oyal
    -0.68
    enfranch
    -0.68
    orno
    -0.68
    illusion
    -0.66
    hov
    -0.64
    ovan
    -0.63
    ibaba
    -0.63
    POSITIVE LOGITS
     remarks
    1.03
     tweets
    0.97
     tweeted
    0.93
     stating
    0.91
     remark
    0.89
     comments
    0.88
     sarcast
    0.87
     tweeting
    0.86
     tweet
    0.86
     quotes
    0.85
    Act Density 0.296%

    No Known Activations